Differences

This shows you the differences between two versions of the page.

--- forecasting:clustering [2020/12/02 01:24]
jclaudio created
+++ forecasting:clustering [2021/09/19 21:59] (current)
@@ Line 1: / Line 1: @@
-===== Regression =====
+===== Clustering =====
-**Clustering** is a statistical process used to group "similar" data points together within a dataset. This process is similar in nature to classification, where similar data points are grouped together and then assigned a specific label (eg; dogs or cats). However, clustering is different in  the sense that the distinct groups are not given any labels. They are simply referred to as //Cluster 1, Cluster 2, Cluster 3, etc//.
+**Clustering** is a statistical process used to group "similar" data points together within a dataset. This process is similar in nature to classification, where similar data points are grouped together and then assigned a specific label (eg; dogs or cats). However, clustering is different in  the sense that the distinct groups are not given any labels. They are simply referred to as //Cluster 1, Cluster 2, Cluster 3, etc//. Note that in clustering, we do not care about any dependent variables because we just want to see which **inputs** are similar to one another.
 Clustering methods discussed on this page:
   * K-Means Clustering
   * Hierarchical Clustering
+==== K-Means Clustering ====
+**Form:** Universal. Can be used with any dataset\\
+**When to use it:** Used when you want to define a specific number of clusters in the dataset (eg: maybe you specifically want to see only 5 different clusters). However, you can also use the **Elbow Method** to numerically determine an optimal number of clusters for your dataset.\\
+**Library used:** [[https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html#sklearn.cluster.KMeans|sklearn.clustering.KMeans]]\\
+**General workflow:**
+  - Import **KMeans** class from **sklearn.clustering**
+  - Create an instance of the **KMeans()** class
+  - Apply the **.fit_predict** method to your independent variables
+  - **//__ TALK ABOUT ELBOW METHOD HERE__//**
+**Sample code and output:**\\
+//Note that we can easily visualize our results by using a 2D plot. This type of plot is only possible when we have no more than **2** independent variable.//
+==== Hierarchical Clustering ====
+**Form:** y = w0 + w1*x1\\
+**When to use it:** Used when there is only one independent variable whose degree is assumed to be 1\\
+**Library used:** [[https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html|sklearn.linear_model.LinearRegression]]\\
+**General workflow:**
+  - Import **LinearRegression** library from **sklearn.linear_model**
+  - Create an instance of the **LinearRegression()** class
+  - Apply the **.fit** method to your independent and dependent variables
+  - Apply the **.predict** method to your regressor to make any predictions about your data
+**Sample code and output:**\\
+{{ :forecasting:simpregcode.png?1600 |}}
+{{ :forecasting:simpleregressoutput.png |}}\\
+//Note that we can easily visualize our results by using a 2D plot. This type of plot is only possible when we have no more than **1** independent variable.//