IV. ML Models

Refer to General Data Science for more specific topics.

Core Principles of ML/AI

Everything we do in ML/AI is a permutation of the following principles:

  1. For each ML algorithm, we have an underlying computation model with a set of learnable parameters (e..g, kNN, where k is a learnable parameter).
  2. To find a corresponding set of optimal learnable parameters to the computational model, we have a learning algorithm to find the best set of learnable parameters that can accurately map the input to its output.
  3. We find this optimal set of learnable parameters by minimizing some measure of error (e.g., Gradient descent, PLR for binary logistic regression)

When learning a new model, understand and identify its learnable parameters and how those set of optimal parameters are found (through some loss function).

Supervised ML Algorithms: Regression

Linear Regression

Assumptions for Linear Regression

One of the assumptions for using the linear regression model is that the underlying relationship is linear. Therefore, before conducting a linear regression - ensure that the relationship between the IDV and DV are LINEAR. Otherwise, use polynomial (non-linear) regression.

File Comments Type
_About Linear Regression - Regression, Multivariate Analysis
Interpreting the LR Model How to interpret the output of a LR model (regression coefficients, intercepts, p-values, t-statistics) Output
Performance Metrics for Regression Models R2 Score, MAE, MSE etc. Model Evaluation
Regression Summary Generating a regression summary based on the performance metrics. Model Evaluation
Running Linear Regression with sklearn Running LR with sklearn. Code
Running Linear Regression with statsmodel.api Running LR using statsmodel.api and ols as the regression model. Code
Simple vs Multiple Linear Regression Understanding the difference between simple and multiple linear regression (partial slopes). -
Visualizing Predicted vs Actual Visualizing predicted vs actual values in a scatterplot. Scatterplot

{ .block-language-dataview}

Supervised ML Algorithms: Classification

Logistic Regression

Note!

Binary/Linear Logistic Regression is a classification algorithm. Also have an underlying assumption that the data is linearly separable, otherwise use non-linear classification algorithms.

File Comments Type
_About Binary Classificaition Binary classification using Perceptron Learning Rule and using an activation function to convert the binary classifier into a Logistic Regression algorithm as a probabilistic model. Binary and Multi-class
Classification Reports Computing the precision, recall and F1-score for each class. Model Evaluation
Confusion Matrix Evaluating the performance of a classification model using a confusion matrix based on 4 outcomes, TP, TN, FP, FN. Information on the 4 performance metrics including the weighted f1-score for multi-classification tasks. How to generate cm in python. Model Evaluation
Running Logit with statsmodel Running Logit with statsmodel.api Multivariate Analysis, Code

Naive Bayes

File Comments Type
_About Naive Bayes Bayes Theorem in NB algorithm for both binary and multi-classification tasks. Binary and Multiclass
Running Gaussian Naive Bayes with sklearn Running Naive Bayes Classification with sklearn Code

k-Nearest Neighbours (kNN)*

Can be used for regression tasks

File Comments Type
_About kNN How the algorithm works using Euclidean distance. Binary and Multi-class classification, Regression
{Code} Running k-NN with sklearn Building a knn classifier in python using sklearn -
Finding the Optimal k How to find the optimal k for knn. -

Decision Trees*

Can be used for regression tasks

File Comments Type
_About Decision Trees What are decision trees, how it determines which features to split the data on, and importance of limiting the depth. Binary or multi-classification, Regression.
{Code} Running Decision Trees in sklearn Building a decisoin tree classifier in python using sklearn. Code
Graphical Representation & Interpreting a Decision Tree Output How to interpret the output of a decision tree and code building the graphical representation of the decision tree in python. Output
Understanding Entropy and Information Gain How entropy and IG is used to select features of importance to split the data. Theory
Powered by Forestry.md