All About Using Python & R for Machine Learning, Data Science, Data Analyst, Deep Learning, Artificial Intelligence

Wednesday, November 15, 2017

Data Preprocessing (Importing Dataset)

So, as I explained in the previous tutorial, the best library to import the data set is pandas. We are going to declare a new variable that is going to be the data set itself and simply called "dataset".


We have to use the shortcut "pd" that is shortcut for pandas and method read_csv() as code above. The dataset file is Data.csv.


So, we have four columns: Country, Age, Salary, and Purchased. also we have ten observations (rows). You have to understand that index in Python is start at zero. 

There is something very important to understand machine learning in Python, we have a dataset but we need to distinguish the matrix of features and the dependent variable vector. We are going to create the matrix of three independent variables and simply called "X". Also we create the dependent variable vector which is going to be the last column with the ten observations.



Below, how to write the code


for variable "X" (independent variable), we take all the lines of data and -1 means left the last column. So, only the first three column. for variable "y" (dependent variable), 3 means only get column index three.



Ok, we have imported the data set and prepared the data correctly. See you in the next tutorial
Share:

0 comments:

Post a Comment

Pageviews

Visitors

Flag Counter