forecasting:clustering

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
forecasting:clustering [2020/12/02 01:24]
jclaudio created
forecasting:clustering [2021/09/19 21:59] (current)
Line 1: Line 1:
-===== Regression ​===== +===== Clustering ​===== 
-**Clustering** is a statistical process used to group "​similar"​ data points together within a dataset. This process is similar in nature to classification,​ where similar data points are grouped together and then assigned a specific label (eg; dogs or cats). However, clustering is different in  the sense that the distinct groups are not given any labels. They are simply referred to as //Cluster 1, Cluster 2, Cluster 3, etc//​. ​+**Clustering** is a statistical process used to group "​similar"​ data points together within a dataset. This process is similar in nature to classification,​ where similar data points are grouped together and then assigned a specific label (eg; dogs or cats). However, clustering is different in  the sense that the distinct groups are not given any labels. They are simply referred to as //Cluster 1, Cluster 2, Cluster 3, etc//. Note that in clustering, we do not care about any dependent variables because we just want to see which **inputs** are similar to one another.
  
 Clustering methods discussed on this page:  ​ Clustering methods discussed on this page:  ​
   * K-Means Clustering   * K-Means Clustering
   * Hierarchical Clustering   * Hierarchical Clustering
 +
 +==== K-Means Clustering ====
 +**Form:** Universal. Can be used with any dataset\\
 +**When to use it:** Used when you want to define a specific number of clusters in the dataset (eg: maybe you specifically want to see only 5 different clusters). However, you can also use the **Elbow Method** to numerically determine an optimal number of clusters for your dataset.\\
 +**Library used:** [[https://​scikit-learn.org/​stable/​modules/​generated/​sklearn.cluster.KMeans.html#​sklearn.cluster.KMeans|sklearn.clustering.KMeans]]\\
 +**General workflow:**
 +  - Import **KMeans** class from **sklearn.clustering**
 +  - Create an instance of the **KMeans()** class
 +  - Apply the **.fit_predict** method to your independent variables ​
 +  - **//__ TALK ABOUT ELBOW METHOD HERE__//**
 +**Sample code and output:**\\
 +
 +//Note that we can easily visualize our results by using a 2D plot. This type of plot is only possible when we have no more than **2** independent variable.//
 +
 +==== Hierarchical Clustering ====
 +**Form:** y = w0 + w1*x1\\
 +**When to use it:** Used when there is only one independent variable whose degree is assumed to be 1\\
 +**Library used:** [[https://​scikit-learn.org/​stable/​modules/​generated/​sklearn.linear_model.LinearRegression.html|sklearn.linear_model.LinearRegression]]\\
 +**General workflow:**
 +  - Import **LinearRegression** library from **sklearn.linear_model**
 +  - Create an instance of the **LinearRegression()** class
 +  - Apply the **.fit** method to your independent and dependent variables
 +  - Apply the **.predict** method to your regressor to make any predictions about your data
 +**Sample code and output:**\\
 +{{ :​forecasting:​simpregcode.png?​1600 |}}
 +{{ :​forecasting:​simpleregressoutput.png |}}\\
 +//Note that we can easily visualize our results by using a 2D plot. This type of plot is only possible when we have no more than **1** independent variable.//
 +
  
  
  • forecasting/clustering.1606872258.txt.gz
  • Last modified: 2021/09/19 21:59
  • (external edit)