About kNN
tags: #ML/supervised/classification/knn
What is kNN?
kNN is a supervised algorithm ML algorithm used to perform both classification and regression tasks.
How does kNN work?
Classification Tasks
k-NN works by using proximity of k-number of nearest data points (neighbours) and assigns the label of the majority class to the new data point based on k-number of items with the lowest Euclidean Distance.
What happens when there is a tie?
- Increase/decrease k by 1
- Random Breaking - In this approach, the class is randomly chosen among the tied classes. This approach is less commonly used and is typically only used as a last resort when the other approaches cannot be applied.
- Use baseline classifier - the baseline classifier assigns the class that occurs most frequently in the training set to the test example
Regression Tasks
The algorithm takes the k-nearest values of the target variable and compute the mean of those values.
How do we compute the Euclidean distance of two points?
Suppose two points:
- Point A: (x1, y1)
- Point B: (x2, y2)
We can compute the Euclidean distance as follows:
In the case of KNN, we will be computing the distance between the new data point and the known data points, assuming there are 2 features (x and y):