Gaussian Naive Bayes with sklearn

tags: #ML/supervised/classification/nb

A Gaussian Naive Bayes algorithm assumes that the probability distribution of the input variables is Gaussian (i.e., follows a normal distribution).

  1. Import required libraries and classifiers
# Import required libraries and Gaussian Naive Bayes classifier
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
  1. Import required dataset
df = pd.read_csv('FILENAME.CSV')
  1. Split dataset into a feature matrix and target array containing the label
#split dataset in features and target variable 
feature_cols = ["LIST OF COLNAMES"] 
X = df[feature_cols] # Features 
y = df.TARGET_VAR # Target variable
  1. Split dataset into random training and test dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size, random_state)
  1. Initialize classifier
# Initialize gnb classifier
gnb = GaussianNB()
  1. Fit classifier to the data
# Train the classifier:
model = gnb.fit(X_train, y_train)
  1. Predict with unseen data
# Make predictions with the classifier:
y_pred = gnb.predict(X_test)
print(y_pred)
  1. Evaluate accuracy of the model
# Evaluate label (subsets) accuracy:
print(accuracy_score(X_test, y_pred))
Powered by Forestry.md