Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
forecasting:clustering [2020/12/02 01:24] jclaudio created |
forecasting:clustering [2021/09/19 21:59] (current) |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ===== Regression ===== | + | ===== Clustering ===== |
- | **Clustering** is a statistical process used to group "similar" data points together within a dataset. This process is similar in nature to classification, where similar data points are grouped together and then assigned a specific label (eg; dogs or cats). However, clustering is different in the sense that the distinct groups are not given any labels. They are simply referred to as //Cluster 1, Cluster 2, Cluster 3, etc//. | + | **Clustering** is a statistical process used to group "similar" data points together within a dataset. This process is similar in nature to classification, where similar data points are grouped together and then assigned a specific label (eg; dogs or cats). However, clustering is different in the sense that the distinct groups are not given any labels. They are simply referred to as //Cluster 1, Cluster 2, Cluster 3, etc//. Note that in clustering, we do not care about any dependent variables because we just want to see which **inputs** are similar to one another. |
Clustering methods discussed on this page: | Clustering methods discussed on this page: | ||
* K-Means Clustering | * K-Means Clustering | ||
* Hierarchical Clustering | * Hierarchical Clustering | ||
+ | |||
+ | ==== K-Means Clustering ==== | ||
+ | **Form:** Universal. Can be used with any dataset\\ | ||
+ | **When to use it:** Used when you want to define a specific number of clusters in the dataset (eg: maybe you specifically want to see only 5 different clusters). However, you can also use the **Elbow Method** to numerically determine an optimal number of clusters for your dataset.\\ | ||
+ | **Library used:** [[https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html#sklearn.cluster.KMeans|sklearn.clustering.KMeans]]\\ | ||
+ | **General workflow:** | ||
+ | - Import **KMeans** class from **sklearn.clustering** | ||
+ | - Create an instance of the **KMeans()** class | ||
+ | - Apply the **.fit_predict** method to your independent variables | ||
+ | - **//__ TALK ABOUT ELBOW METHOD HERE__//** | ||
+ | **Sample code and output:**\\ | ||
+ | |||
+ | //Note that we can easily visualize our results by using a 2D plot. This type of plot is only possible when we have no more than **2** independent variable.// | ||
+ | |||
+ | ==== Hierarchical Clustering ==== | ||
+ | **Form:** y = w0 + w1*x1\\ | ||
+ | **When to use it:** Used when there is only one independent variable whose degree is assumed to be 1\\ | ||
+ | **Library used:** [[https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html|sklearn.linear_model.LinearRegression]]\\ | ||
+ | **General workflow:** | ||
+ | - Import **LinearRegression** library from **sklearn.linear_model** | ||
+ | - Create an instance of the **LinearRegression()** class | ||
+ | - Apply the **.fit** method to your independent and dependent variables | ||
+ | - Apply the **.predict** method to your regressor to make any predictions about your data | ||
+ | **Sample code and output:**\\ | ||
+ | {{ :forecasting:simpregcode.png?1600 |}} | ||
+ | {{ :forecasting:simpleregressoutput.png |}}\\ | ||
+ | //Note that we can easily visualize our results by using a 2D plot. This type of plot is only possible when we have no more than **1** independent variable.// | ||
+ | |||