Data Preprocessing (Taking Care Of Missing Values) ~ Practical Machine Learning

Friday, November 17, 2017

Data Preprocessing (Taking Care Of Missing Values)

November 17, 2017 No comments

The first problem that we have to deal with is the case where we have some missing data in our data set and that happens quite a lot actually in real life. So, we have to get the trick to handle this problem and make it good for our machine learning model to run correctly. If you still have problem in importing dataset, see the previous tutorial. And here the dataset.

As we can see, there are two missing values in data set. There is one missing data in the age column for Spain and one missing value in the salary column for Germany. So, we need to figure out a better idea to handle this problem. And the most common idea to handle missing data is to take the mean of the columns.

So, as usual we are going to take a library to do this job for us. The library that we are going to use for this one in called scikit-learn preprocessing. The library to import the imputer class.

sklearn is scikit learn contains amazing libraries to make machinery models and preprocessing library contains a lot of class methods to preprocess any dataset. From this library we import the imputer class which will allow us to take care those missing values.

imputer.fit(X[:, 1:3]) means fit only column contains missing value. Column index 1 and 2. Why we use index 3, because in Python, the upper bound is excluded.

imputer.transform(X[:, 1:3]) means we replace the NaN value with the mean value using transform() method.

Here the result:

Congratulations, Now you know how to take care of missing value in Python. You can have fun and try to use another strategy such as median and most_frequent. See you on the next tutorial

Practical Machine Learning

Friday, November 17, 2017

Data Preprocessing (Taking Care Of Missing Values)

0 comments:

Post a Comment

Labels

Blog Archive

Pageviews

Visitors

About The Author

Words of Wisdom

Followers