Skip to content

Supervised learning: who’s supervising the forest?

Supervised learning involves training an algorithm on labeled data and pairing input with correct output. Unsupervised learning uses unlabeled data to find patterns. For example, predicting pizza delivery tips involves features like time, pizza type, distance, and tip history, with the goal of predicting tip outcomes.

A comic guide to Train – test split + Python & R codes

After collecting and preprocessing the dataset, it is essential to divide it into two distinct sets: training set and testing set. The training set is used to train the model while the testing set is used to evaluate its performance. This allows assessment of the model’s generalization to new data. Two code examples in Python and R demonstrate how to create synthetic data and split it into training and testing sets using popular libraries.

Residual plot for model diagnostic

Assessing assumptions like linearity, constant variance, error independence, and normal residuals is essential for linear regression. Residual plots visually assess the model’s goodness of fit, identifying patterns and influential data points. This post provides the Python & R codes for the residual plot

Simple Linear Regression & Least square method

Simple linear regression is a statistical method to model the relationship between two continuous variables, aiming to predict the dependent variable based on the independent variable. The regression equation is Y = a + bX, where Y is the dependent variable, X is the independent variable, a is the intercept, and b is the slope. The method of least squares minimizes the sum of squared residuals to find the best-fitting line coefficients.

Model ensembling

Model ensembling combines multiple models to improve overall performance by leveraging diverse data patterns. Bagging trains model instances on different data bootstraps, while Boosting corrects errors sequentially. Stacking combines models using a meta-model, and Voting uses majority/average predictions. Ensembles reduce variance without significantly increasing bias, but may complicate interpretation and computational cost.

Backward feature selection + example

Backward feature selection involves iteratively removing the least significant feature from a model based on adjusted R-squared. In this example, we are predicting nuts collected by squirrels, features like temperature and rainfall are chosen as significant predictors through this method. The process aims to finalize a model with the most influential features.

Estimating the sparse inverse covariance matrix (precision matrix) by Graphical Lasso (with Python implementation)

Graphical Lasso, also known as GLasso, is a statistical technique used for estimating the sparse inverse covariance matrix (precision matrix) of a multivariate Gaussian distribution. Here, Sparsity means that many elements of the matrix are… Estimating the sparse inverse covariance matrix (precision matrix) by Graphical Lasso (with Python implementation)

error: Content is protected !!