Performance Metrics for Regression Models

There are 3 main metrics for evaluating a regression model:

1. R Square/Adjusted R Square

This describes the amount of variation in the DV that can be account for by the model

For multiple linear regression - use adjusted R2 when there are multiple IDV introduced to the model. Since R2 always increases as you add more predictors to a model, adjusted R2 can serve as a metric that tells you how useful a model is, adjusted for the number of predictors in a model.

# using sklearn: method 1
from sklearn.metrics import r2_score
r2_score(y_true, y_pred)

# using sklearn: method 2
model.score(X_train, y_train) # for R2

# Manually compute the Adjusted R2 using the formula (below)

Formula: Adjusted R2

$A d j u s t e d R^{2} = 1 - (\frac{(1 - R^{2}) (n - 1)}{n - k - 1})$

where:

R2: The R2 of the model
n: The number of observations
k: The number of predictor variables

If using statsmodel for linear regression:

# if using statsmodel
import statsmodels.api as sm
X = sm.add_constant(X) # add to feature matrix BEFORE SPLITTING

model = sm.OLS(y_train, X_train).fit()
print(model.rsquared_adj) #display adjusted R-squared

2. Mean Square Error (MSE)/ Root Mean Square Error (RMSE)

The MSE gives you an average squared difference (error) on how much your predicted results deviate from the actual number, measured as:

M S E = \frac{1}{N} \sum_{i = 1}^{N} (y_{i} - \hat{y_{i}})^{2}

RMSE

Root Mean Square Error(RMSE) is the square root of MSE. It is used more commonly than MSE because firstly sometimes MSE value can be too big to compare easily. The smaller an RMSE value, the closer predicted and observed values are.

from sklearn.metrics import mean_squared_error  
import math  
print(mean_squared_error(Y_test, Y_predicted))  # MSE
print(math.sqrt(mean_squared_error(Y_test, Y_predicted))) # RMSE

3. Mean Absolute Error (MAE)

Mean Absolute Error(MAE) is similar to Mean Square Error(MSE).

The MSE is the absolute difference between the actual or true values and the values that are predicted:

M A E = \frac{1}{N} \sum (y_{i} - \hat{y_{i}})

from sklearn.metrics import mean_absolute_error  
print(mean_absolute_error(Y_test, Y_predicted))