Running Logit in OLS
tags: #ML/supervised/classification/logit
Step 1. Import required packages
import statsmodels.api as sm
Step 2. Import required dataset
You can get the inputs and output the same way as you did with scikit-learn and split dataset into a matrix of predictors and target vector containing the label:
#split dataset in features and target variable
feature_cols = ["LIST OF COLNAMES"]
X = df[feature_cols] # Features
y = df["target_var"] # Target variable
StatsModels doesn’t take the intercept 𝑏₀ into account. Therefore, need to include the additional column of ones to your feature matrix.
You do that with add_constant():
X = sm.add_constant(x)
Intercept is not added by default in Statsmodels regression, so you need to include it manually.
Step 3. Partition the Dataset
- See also: Partitioning the Dataset
# split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size, random_state)
Step 4. Fit and Train Model
Your logistic regression model is going to be an instance of the class statsmodels.discrete.discrete_model.Logit.
To create an instance of the logit object:
>>> model = sm.Logit(y_train, X_train) # Note that the first argument here is `y`, followed by `x`.
To fit the model with existing data:
result = model.fit()
Step 4. Obtain Results
To print results output:
result.summary() # or summary2
You can obtain the values of 𝑏₀ and 𝑏₁ with .params:
>>> result.params
array([b0 , b1]) # intercept and slope
Step 4. Evaluate the model
You can evaluate the model by first generating the predicted output of the model to the testing set using the .predict() function and generate an accuracy report or confusion matrix;
>>> result.predict(X_test)
You can use their values to get the actual predicted outputs (classification):
>>> (result.predict(x) >= 0.5).astype(int)
array([0, 0, 0, 1, 1, 1, 1, 1, 1, 1])