AI – Page 11 – Knowledge sparks

Simple linear regression using train-test split in Python & R

by Kurious Fox
June 17, 2024October 12, 2025

An example of performing simple linear regression using train-test split where the process is as follows, 1. Generate a synthetic dataset: 2. Split the dataset: We use train_test_split to divide the data into training and…

Combining datasets to increase sample size

by Kurious Fox
June 16, 2024December 31, 2024

Detailed information can be found in Combining datasets to improve model fitting or its presentation slide. Summary: The key points of the paper titled “Combining Datasets to Improve Model Fitting” are as follows: Problem and…

Expectation Maximization (EM) & implementation

by Kurious Fox
June 14, 2024October 12, 2024

Expectation Maximization (EM) is an iterative algorithm used for finding maximum likelihood estimates of parameters in statistical models, particularly when the model involves latent variables (variables that are not directly observed). The algorithm is commonly…

A comic guide to denoising noisy data

by Kurious Fox
June 13, 2024October 12, 2024

Handling noisy data is a crucial step in data preprocessing and analysis. In general, here are some common approaches to manage noisy data: 1. Data Cleaning 2. Data Transformation 3. Statistical Techniques 4. Machine Learning…

A comical guide to Missing Not At Random (MNAR)

by Kurious Fox
June 13, 2024October 12, 2024

Recall that Missing Not At Random (MNAR) is a type of missing data mechanism where the probability of missingness is related to the unobserved data itself. Here are some more examples of MNAR: In each…

What’s Missing at Random (MAR)?

by Kurious Fox
June 13, 2024October 12, 2025

Missing at Random (MAR) is a statistical term indicating that the likelihood of data being missing is related to some of the observed data but not to the missing data itself. This means that the…

Multiple regression analysis: waiting time to log in to Windows

by Kurious Fox
June 11, 2024August 18, 2024

Multiple regression analysis can be used to understand the relationship between the waiting time to log in to Windows (dependent variable) and several independent variables. Let’s assume we have the following independent variables: Suppose that…

The success rates of Cupid’s arrows

by Kurious Fox
June 2, 2024September 20, 2024

I advised a master’s student to use the binomial probability formula to determine the likelihood of attracting the affection of 15 girls, with Cupid’s success rate at 0.7. The analysis shows that the highest probability of success occurs when 10 girls reciprocate love, with a probability of 0.33.

Grazing the maze of probability

by Kurious Fox
May 31, 2024September 15, 2024

Supplementary materials for section Grazing the maze of probability & A random variable mood in the KSML app: Basic rules of probability: Mutually exclusive events Conditional probability for medical testing in a forestThe conditional probability…

Estimating the sparse inverse covariance matrix (precision matrix) by Graphical Lasso (with Python implementation)

by Kurious Fox
May 28, 2024May 5, 2025

Graphical Lasso, also known as GLasso, is a statistical technique used for estimating the sparse inverse covariance matrix (precision matrix) of a multivariate Gaussian distribution. Here, Sparsity means that many elements of the matrix are…

Generating missing data and evaluating missing data analysis in Python

by Kurious Fox
May 28, 2024October 12, 2024

Generating missing values Generating missing values with a given percentage of missingness for a dataframe or numpy array: Generating missing values with a given missing rate for a time series list: Calculating MSE ignoring missing…

The conditional probability of Tom finding Jerry

by Kurious Fox
May 27, 2024September 20, 2024

My all time favourite catch is “JERRY catching TOM!” ? Little Jerry is so smart, and do you know that he knows probability as well? One day, Jerry was thinking, “Hmm, every time Tom chases…

Introduction to Principal Component Analysis (PCA) and implementation in R and Python

by Kurious Fox
May 26, 2024February 15, 2026

Principal Component Analysis (PCA) is a statistical technique used for dimensionality reduction, which simplifies the complexity in high-dimensional data while retaining important infomation. The basic idea of this method is to transform a large set…

Bayes theorem in finance of a magical forest

by Kurious Fox
May 25, 2024August 18, 2024

Here, we denote by the event NOT . Example 1: Magical Investment Returns In the magical forest, gnomes invest in enchanted acorns, which sometimes turn into golden trees. A gnome named Glim invests in an…

Conditional probability

by Kurious Fox
May 25, 2024February 15, 2026

Here, we denote by the event NOT . Example 1: Squirrel Flu Testing In a forest, a group of squirrels is concerned about a new illness called “Squirrel Flu.” It’s more dangerous than the ordinary…

Tricks for remembering formulas & properties of multivariate normal distribution

by Kurious Fox
May 23, 2024October 12, 2025

First, it’s easier to understand & remember the keys properties of multivariate normal distribution by understanding the Mahalanobis distance. So, to start, recall that the Mahalanobis distance is a measure of the distance between a…

Permutation

by Kurious Fox
May 20, 2024February 15, 2026

A permutation refers to the arrangement of objects in a specific order. The order of arrangement is important in permutation. A permutation let us know how many different ways a set or number of things…

Examples of combinations

by Kurious Fox
May 20, 2024February 15, 2026

Previous: Combinations definition and quizzes

Examples of Exponential distribution

by Kurious Fox
May 19, 2024February 15, 2026

The exponential distribution is commonly used to model the time between events in a Poisson process. It is defined by a single parameter, , which is the rate parameter. The probability density function (PDF) of…

denoising via dimension reduction in python

by Kurious Fox
May 19, 2024October 12, 2024

Dimension reduction methods like Principal Component Analysis (PCA) or Singular Value Decomposition (SVD) can be used for denoising data because they work by retaining the most important features (or dimensions) that capture the majority of…

why we can & probably should use missing at random imputation methods for data that’s not missing at random?

by Kurious Fox
May 17, 2024October 12, 2024

Missing At Random (MAR) imputation methods are based on the assumption that the chance of missing data is not related to the missing data itself, but might be related to some of the observed data.…

Missing data analysis: where’s your missing piece?

by Kurious Fox
May 17, 2024September 6, 2024

Why missing data occurs can be attributed to various reasons, including human error, malfunctioning equipment, or even intentional omission. It is important to handle missing data because it can significantly impact the reliability and accuracy…

Imputation using SoftImpute in python

by Kurious Fox
May 17, 2024October 12, 2024

SoftImpute is a matrix completion algorithm in Python that allows you to fill in missing data in your dataset. This method is based on Singular Value Decomposition (SVD) and Iterative Soft Thresholding. Here’s a basic…

Multiple Imputation with Chained Equations method & Python codes

by Kurious Fox
May 17, 2024October 12, 2024

MICE (Multiple Imputation by Chained Equations) is a statistical method used for handling missing data by creating multiple imputations or “guesses” for the missing values. It works by using a set of regression models to…

K-Nearest Neighbors (KNN) imputation in sklearn

by Kurious Fox
May 17, 2024October 12, 2024

K-Nearest Neighbors (KNN) imputation is another method to handle missing data. It uses the ‘k’ closest instances (rows) to each instance that contains any missing values to fill in those values. In sklearn, you can…

A comic guide to mean/median/mode imputation & Python codes

by Kurious Fox
May 17, 2024October 12, 2024

Handling missing data is a common preprocessing task in machine learning. In scikit-learn, you can handle missing data by using imputation techniques provided by the SimpleImputer class or by employing other strategies like dropping rows/columns with missing…

SVD for dimension reduction

by Kurious Fox
May 16, 2024August 18, 2024

Singular Value Decomposition (SVD) is a powerful matrix decomposition technique that generalizes the concept of eigenvalue decomposition to non-square matrices. Eigenvalue decomposition specifically decomposes a square matrix into its constituent eigenvalues and eigenvectors. This decomposition…

test for outliers in multivariate data in Python

by Kurious Fox
May 16, 2024August 18, 2024

To test for outliers in multivariate data in Python, you can use several libraries like numpy, scipy, pandas, sklearn, etc. Here’s how you can do it: Mahalanobis distance using Scipy library The Mahalanobis distance is a statistical measure used…

Application of Bayesian theorem in spam detection & medical diagnosis

by Kurious Fox
May 2, 2024October 1, 2024

Example 1: Spam Detection Let’s say historically, 20% of emails are spam, so and the probability that the email is not spam is . Suppose the probability of observing the word “free” in a spam…

« Previous
1
…
9
10
11