Gaussian Naive Bayes with sklearn

tags: #ML/supervised/classification/nb

A Gaussian Naive Bayes algorithm assumes that the probability distribution of the input variables is Gaussian (i.e., follows a normal distribution).

Import required libraries and classifiers

# Import required libraries and Gaussian Naive Bayes classifier
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

Import required dataset

df = pd.read_csv('FILENAME.CSV')

Split dataset into a feature matrix and target array containing the label

#split dataset in features and target variable 
feature_cols = ["LIST OF COLNAMES"] 
X = df[feature_cols] # Features 
y = df.TARGET_VAR # Target variable

Split dataset into random training and test dataset

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size, random_state)

Initialize classifier

# Initialize gnb classifier
gnb = GaussianNB()

Fit classifier to the data

# Train the classifier:
model = gnb.fit(X_train, y_train)

Predict with unseen data

# Make predictions with the classifier:
y_pred = gnb.predict(X_test)
print(y_pred)

Evaluate accuracy of the model

# Evaluate label (subsets) accuracy:
print(accuracy_score(X_test, y_pred))