Performance Metrics for Regression Models
tags: #ML/supervised/regression
There are 3 main metrics for evaluating a regression model:
1. R Square/Adjusted R Square
This describes the amount of variation in the DV that can be account for by the model
For multiple linear regression - use adjusted R2 when there are multiple IDV introduced to the model. Since R2 always increases as you add more predictors to a model, adjusted R2 can serve as a metric that tells you how useful a model is, adjusted for the number of predictors in a model.
# using sklearn: method 1
from sklearn.metrics import r2_score
r2_score(y_true, y_pred)
# using sklearn: method 2
model.score(X_train, y_train) # for R2
# Manually compute the Adjusted R2 using the formula (below)
where:
- R2: The R2 of the model
- n: The number of observations
- k: The number of predictor variables
If using statsmodel for linear regression:
# if using statsmodel
import statsmodels.api as sm
X = sm.add_constant(X) # add to feature matrix BEFORE SPLITTING
model = sm.OLS(y_train, X_train).fit()
print(model.rsquared_adj) #display adjusted R-squared
2. Mean Square Error (MSE)/ Root Mean Square Error (RMSE)
The MSE gives you an average squared difference (error) on how much your predicted results deviate from the actual number, measured as:
Root Mean Square Error(RMSE) is the square root of MSE. It is used more commonly than MSE because firstly sometimes MSE value can be too big to compare easily. The smaller an RMSE value, the closer predicted and observed values are.
from sklearn.metrics import mean_squared_error
import math
print(mean_squared_error(Y_test, Y_predicted)) # MSE
print(math.sqrt(mean_squared_error(Y_test, Y_predicted))) # RMSE
3. Mean Absolute Error (MAE)
Mean Absolute Error(MAE) is similar to Mean Square Error(MSE).
The MSE is the absolute difference between the actual or true values and the values that are predicted:
from sklearn.metrics import mean_absolute_error
print(mean_absolute_error(Y_test, Y_predicted))