Gaussian Naive Bayes with sklearn
tags: #ML/supervised/classification/nb
A Gaussian Naive Bayes algorithm assumes that the probability distribution of the input variables is Gaussian (i.e., follows a normal distribution).
- Import required libraries and classifiers
# Import required libraries and Gaussian Naive Bayes classifier
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
- Import required dataset
df = pd.read_csv('FILENAME.CSV')
- Split dataset into a feature matrix and target array containing the label
#split dataset in features and target variable
feature_cols = ["LIST OF COLNAMES"]
X = df[feature_cols] # Features
y = df.TARGET_VAR # Target variable
- Split dataset into random training and test dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size, random_state)
- Initialize classifier
# Initialize gnb classifier
gnb = GaussianNB()
- Fit classifier to the data
# Train the classifier:
model = gnb.fit(X_train, y_train)
- Predict with unseen data
# Make predictions with the classifier:
y_pred = gnb.predict(X_test)
print(y_pred)
- Evaluate accuracy of the model
# Evaluate label (subsets) accuracy:
print(accuracy_score(X_test, y_pred))