Differences

This shows you the differences between two versions of the page.

--- forecasting:classification [2020/12/02 01:13]
kmacloves
+++ forecasting:classification [2021/09/19 21:59] (current)
@@ Line 136: / Line 136: @@
 **x** is the location of some data point and **l** is the //landmark//, or center, of the kernel. Note that this Kernel would work with 1-dimensional data as well; K would still be a function of **x** and **l** but the visualization would have the data on a 1D line while the visualization of the Kernel is K vs x.
-What's happening with the equation is that we're calculating the distance between the landmark and points in our dataset and then adjusting sigma to fit an optimal margin. sigma controls the base of the RBF function and the base gets mapped onto the dataset which forms the margin that separates the data. Let's consider the following dataset:
+What's happening with the equation is that we're calculating the distance between the landmark and points in our dataset and then adjusting sigma to fit an optimal margin. Sigma controls the base of the RBF function and the base gets mapped onto the dataset which forms the margin that separates the data. Let's consider dataset in Fig. 18 again. To do the kernel trick first we have to set the position of the landmark and then applying the kernel. A visual of this is shown in Fig. 22; We can see that the base of the kernel is mapped onto the 2D space and this is the separation between the categories.
+{{ :forecasting:kernelsvm8.jpg?600 |}}
+**Fig. 22.** Applying the Gaussian RBF Kernel onto the data.
+As mentioned before, sigma controls the size of the base, which in turn controls how large our margin is. Just like in SVM we adjust the margin, which is sigma, to be the maximum margin, so the maximum distance between the two categories. For some intuition if we increased sigma, we'd have a larger base and thus a larger circle in this example. If we decreased sigma, we'd have a smaller base and thus a smaller circle. Larger and smaller sigma are shown in Fig. 23 and Fig. 24, respectively.
+{{ :forecasting:kernelsvm9.jpg?600 |}}
+**Fig. 23.** Graph that provides intuition for a larger sigma.
+{{ :forecasting:kernelsvm10.jpg?600 |}}
+**Fig. 24.** Graph that provides intuition for a smaller sigma.
+After the machine has trained the model on the training set and found the optimal margin, we have effectively separated the categories. Therefore, when predicting or categorizing new data we simply apply the kernel to the data point and calculate for K. If K is outside of the margin then we can set it to 0 and if it is in the margin then we can set it to K > 0.
+=== Computation Restrictions ===
+Recall that the problem with mapping to a higher dimension was computationally intensive because we needed to map each 2D data point into 3D space and then find the optimal hyperplane in the 3D space and finally map everything back to the 2D space. With the kernel trick, none of the data is mapped onto a higher dimension. Recall the Gaussian RBF Kernel shown in Fig. 21; the kernel is shown in 3D to visualize the energy of the kernel within the margin. Everything that is outside of margin is considered to have K = 0, which means that every point that results in k = 0 belongs to 1 category. With the other category we are simply finding to see if K > 0 and if they are then the point belongs in the other category. The kernel is updated as the margin is adjusted to match the training data. There is no mapping into a third dimension or finding a 3D hyperplane.
+The Python implementation for Kernel SVM is as follows (Does not include importing the dataset, splitting the dataset, and feature scaling):
+{{ :forecasting:kernelsvmcode.jpg?600 |}}
+**Fig. 25.** Kernel SVM Python implementation.
+This method yielded 93% accuracy and a visualization of the results are shown:
+{{ :forecasting:kernelsvmresults.png?600 |}}
+**Fig. 26.** Kernel SVM Test Set Results.