Data Preprocessing (Importing Dataset) ~ Practical Machine Learning

Wednesday, November 15, 2017

Data Preprocessing (Importing Dataset)

November 15, 2017 No comments

So, as I explained in the previous tutorial, the best library to import the data set is pandas. We are going to declare a new variable that is going to be the data set itself and simply called "dataset".

We have to use the shortcut "pd" that is shortcut for pandas and method read_csv() as code above. The dataset file is Data.csv.

So, we have four columns: Country, Age, Salary, and Purchased. also we have ten observations (rows). You have to understand that index in Python is start at zero.

There is something very important to understand machine learning in Python, we have a dataset but we need to distinguish the matrix of features and the dependent variable vector. We are going to create the matrix of three independent variables and simply called "X". Also we create the dependent variable vector which is going to be the last column with the ten observations.

Below, how to write the code

for variable "X" (independent variable), we take all the lines of data and -1 means left the last column. So, only the first three column. for variable "y" (dependent variable), 3 means only get column index three.

Ok, we have imported the data set and prepared the data correctly. See you in the next tutorial

Practical Machine Learning

Wednesday, November 15, 2017

Data Preprocessing (Importing Dataset)

0 comments:

Post a Comment

Labels

Blog Archive

Pageviews

Visitors

About The Author

Words of Wisdom

Followers