The Data Science Method (DSM) -Pre-processing and Training Data Development

Aiden V Johnson
5 min readApr 15, 2019
Photo by Kevin Jarrett on Unsplash

This is the fourth article in a series about how to take your data science projects to the next level by using a methodological approach similar to the scientific method coined the Data Science Method. This article is focused on the pre-processing of model development dataset and training data development. If you missed the previous article(s) in this series, you can go to the beginning here, or click on each step title below to read a specific step in the process.

The Data Science Method

  1. Problem Identification
  2. Data Collection, Organization, and Definitions
  3. Exploratory Data Analysis
  4. Pre-processing and Training Data Development
  5. Modeling
  6. Documentation

Pre-processing is the concept of standardizing your model development dataset. This is applied in situations where you have differences in the magnitude of numeric features and situations where you have categorical and continuous variables. This would also be the juncture where other numeric translation would be applied to meet…

--

--