===== Clustering ===== **Clustering** is a statistical process used to group "similar" data points together within a dataset. This process is similar in nature to classification, where similar data points are grouped together and then assigned a specific label (eg; dogs or cats). However, clustering is different in the sense that the distinct groups are not given any labels. They are simply referred to as //Cluster 1, Cluster 2, Cluster 3, etc//. Note that in clustering, we do not care about any dependent variables because we just want to see which **inputs** are similar to one another. Clustering methods discussed on this page: * K-Means Clustering * Hierarchical Clustering ==== K-Means Clustering ==== **Form:** Universal. Can be used with any dataset\\ **When to use it:** Used when you want to define a specific number of clusters in the dataset (eg: maybe you specifically want to see only 5 different clusters). However, you can also use the **Elbow Method** to numerically determine an optimal number of clusters for your dataset.\\ **Library used:** [[https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html#sklearn.cluster.KMeans|sklearn.clustering.KMeans]]\\ **General workflow:** - Import **KMeans** class from **sklearn.clustering** - Create an instance of the **KMeans()** class - Apply the **.fit_predict** method to your independent variables - **//__ TALK ABOUT ELBOW METHOD HERE__//** **Sample code and output:**\\ //Note that we can easily visualize our results by using a 2D plot. This type of plot is only possible when we have no more than **2** independent variable.// ==== Hierarchical Clustering ==== **Form:** y = w0 + w1*x1\\ **When to use it:** Used when there is only one independent variable whose degree is assumed to be 1\\ **Library used:** [[https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html|sklearn.linear_model.LinearRegression]]\\ **General workflow:** - Import **LinearRegression** library from **sklearn.linear_model** - Create an instance of the **LinearRegression()** class - Apply the **.fit** method to your independent and dependent variables - Apply the **.predict** method to your regressor to make any predictions about your data **Sample code and output:**\\ {{ :forecasting:simpregcode.png?1600 |}} {{ :forecasting:simpleregressoutput.png |}}\\ //Note that we can easily visualize our results by using a 2D plot. This type of plot is only possible when we have no more than **1** independent variable.//