Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
forecasting:data_preprocessing [2020/11/11 19:25]
kmacloves [Encoding Categorical Data]
forecasting:data_preprocessing [2021/09/19 21:59] (current)
Line 19: Line 19:
 Since the level corresponds to the position, we don't care about the first column. Let's say that the Salary is the dependent variable and Level is the independent variable that Salary depends on. Since the level corresponds to the position, we don't care about the first column. Let's say that the Salary is the dependent variable and Level is the independent variable that Salary depends on.
  
-==== Step 1: Import the Data ====+Step 1: Import the Data
 Importing the data is done using the Pandas library: Importing the data is done using the Pandas library:
  
 |dataset = pd.read_csv('​nameOfDatasheet.csv'​)| |dataset = pd.read_csv('​nameOfDatasheet.csv'​)|
-==== Step 2: Select the values ==== 
  
 +Step 2: Select the values
 //iloc// is a function in the pandas library that locates values from a csv file based on the indexes given in the arguments. Since we want the independent variable, x, to get all the rows and just the second column we can find the values like this: //iloc// is a function in the pandas library that locates values from a csv file based on the indexes given in the arguments. Since we want the independent variable, x, to get all the rows and just the second column we can find the values like this:
  
 |x = dataset.iloc[:,​ 1:​-1].values| |x = dataset.iloc[:,​ 1:​-1].values|
  
-similarly ​since we want all the rows for the dependent variable and just the last column we can find the values like this:+Similarly ​since we want all the rows for the dependent variable and just the last column we can find the values like this:
  
 |y = dataset.iloc[:,​ -1].values| |y = dataset.iloc[:,​ -1].values|
Line 65: Line 65:
 When we are calling one country we will be setting the corresponding space equal to 1 and all other spaces equal to 0: When we are calling one country we will be setting the corresponding space equal to 1 and all other spaces equal to 0:
  
-|France: [1 0 0] | +  ​France: ​[1 0 0]  
-|Spain: [0 1 0]  | +   Spain: ​[0 1 0]  | 
-|Germany: [0 0 1]|+ Germany: ​[0 0 1]  |
  
 To use the OneHotEncoding function in python we can use it from the scikitlearn library: To use the OneHotEncoding function in python we can use it from the scikitlearn library:
  • forecasting/data_preprocessing.1605122710.txt.gz
  • Last modified: 2021/09/19 21:59
  • (external edit)