Running k-NN with sklearn

tags: #ML/supervised/classification/knn

  1. Import required libraries and classes
# Import libraries and classes required for this example:
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.preprocessing import StandardScaler 
import pandas as pd
import numpy as pd
  1. Importing the required dataset
df = pd.read_csv('FILENAME.CSV')
  1. Split dataset into a matrix of predictors and target vector containing the label
#split dataset in features and target variable 
feature_cols = ["LIST OF COLNAMES"] 
X = df[feature_cols] # Features 
y = df["target_Var"] # Target variable
  1. Split dataset into random training and test dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size, random_state)

# see size of training and test dataset
X_train.shape
X_test.shape
  1. Standardize features
# Standardize features by removing mean and scaling to unit variance:
scaler = StandardScaler()
scaler.fit(X_train)

X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
  1. Instantiate the KNN Classifier to fit data
# Use the KNN classifier to fit data:
classifier = KNeighborsClassifier(n_neighbors=5) # create instance of KNN with k=5
classifier.fit(X_train, y_train) # fit KNN to the data
  1. Predict y data with classifier using the X_test
# Predict y data with classifier: 
y_predict = classifier.predict(X_test)
  1. Evaluate model performance: Confusion Matrix, Classification Reports
# evaluate model performance
print(confusion_matrix(y_test, y_predict))
print(classification_report(y_test, y_predict))

The KNN classifier also has a .score() method in evaluating how well the estimator performs on the test dataset. As this is a classifier, this returns the prediction accuracy for the test dataset:

classifer.score(X_test, y_test) # with respect to the k-value 


Resources
Powered by Forestry.md