Authors

Clustering is a statistical process used to group “similar” data points together within a dataset. This process is similar in nature to classification, where similar data points are grouped together and then assigned a specific label (eg; dogs or cats). However, clustering is different in the sense that the distinct groups are not given any labels. They are simply referred to as Cluster 1, Cluster 2, Cluster 3, etc. Note that in clustering, we do not care about any dependent variables because we just want to see which inputs are similar to one another.

Clustering methods discussed on this page:

K-Means Clustering
Hierarchical Clustering

Form: Universal. Can be used with any dataset
When to use it: Used when you want to define a specific number of clusters in the dataset (eg: maybe you specifically want to see only 5 different clusters). However, you can also use the Elbow Method to numerically determine an optimal number of clusters for your dataset.
Library used: sklearn.clustering.KMeans
General workflow:

Import KMeans class from sklearn.clustering
Create an instance of the KMeans() class
Apply the .fit_predict method to your independent variables
TALK ABOUT ELBOW METHOD HERE

Sample code and output:

Note that we can easily visualize our results by using a 2D plot. This type of plot is only possible when we have no more than 2 independent variable.

Form: y = w0 + w1*x1
When to use it: Used when there is only one independent variable whose degree is assumed to be 1
Library used: sklearn.linear_model.LinearRegression
General workflow:

Import LinearRegression library from sklearn.linear_model
Create an instance of the LinearRegression() class
Apply the .fit method to your independent and dependent variables
Apply the .predict method to your regressor to make any predictions about your data

Sample code and output:

Note that we can easily visualize our results by using a 2D plot. This type of plot is only possible when we have no more than 1 independent variable.

Contributing authors:

jclaudio

Created by jclaudio on 2020/12/02 01:24.

Clustering

K-Means Clustering

Hierarchical Clustering

Authors