Running k-NN with sklearn

Import required libraries and classes

# Import libraries and classes required for this example:
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.preprocessing import StandardScaler 
import pandas as pd
import numpy as pd

Importing the required dataset

df = pd.read_csv('FILENAME.CSV')

Split dataset into a matrix of predictors and target vector containing the label

#split dataset in features and target variable 
feature_cols = ["LIST OF COLNAMES"] 
X = df[feature_cols] # Features 
y = df["target_Var"] # Target variable

Split dataset into random training and test dataset

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size, random_state)

# see size of training and test dataset
X_train.shape
X_test.shape

Standardize features

# Standardize features by removing mean and scaling to unit variance:
scaler = StandardScaler()
scaler.fit(X_train)

X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

Instantiate the KNN Classifier to fit data

# Use the KNN classifier to fit data:
classifier = KNeighborsClassifier(n_neighbors=5) # create instance of KNN with k=5
classifier.fit(X_train, y_train) # fit KNN to the data

Predict y data with classifier using the X_test

# Predict y data with classifier: 
y_predict = classifier.predict(X_test)

Evaluate model performance: Confusion Matrix, Classification Reports

# evaluate model performance
print(confusion_matrix(y_test, y_predict))
print(classification_report(y_test, y_predict))

The KNN classifier also has a .score() method in evaluating how well the estimator performs on the test dataset. As this is a classifier, this returns the prediction accuracy for the test dataset:

classifer.score(X_test, y_test) # with respect to the k-value

Resources

Code Tutorial