Skip to content

Expectation Maximization (EM) & implementation

Expectation Maximization (EM) is an iterative algorithm used for finding maximum likelihood estimates of parameters in statistical models, particularly when the model involves latent variables (variables that are not directly observed). The algorithm is commonly… 

A comic guide to denoising noisy data

Handling noisy data is a crucial step in data preprocessing and analysis. In general, here are some common approaches to manage noisy data: 1. Data Cleaning 2. Data Transformation 3. Statistical Techniques 4. Machine Learning… 

A comical guide to Missing Not At Random (MNAR)

Recall that Missing Not At Random (MNAR) is a type of missing data mechanism where the probability of missingness is related to the unobserved data itself. Here are some more examples of MNAR: In each… 

What’s Missing at Random (MAR)?

Missing at Random (MAR) is a statistical term indicating that the likelihood of data being missing is related to some of the observed data but not to the missing data itself. This means that the… 

The success rates of Cupid’s arrows

I advised a master’s student to use the binomial probability formula to determine the likelihood of attracting the affection of 15 girls, with Cupid’s success rate at 0.7. The analysis shows that the highest probability of success occurs when 10 girls reciprocate love, with a probability of 0.33.

Grazing the maze of probability

Supplementary materials for section Grazing the maze of probability & A random variable mood in the KSML app: Basic rules of probability: Mutually exclusive events Conditional probability for medical testing in a forestThe conditional probability… 

Bayes theorem in finance of a magical forest

Here, we denote by the event NOT . Example 1: Magical Investment Returns In the magical forest, gnomes invest in enchanted acorns, which sometimes turn into golden trees. A gnome named Glim invests in an… 

Conditional probability

Here, we denote by the event NOT . Example 1: Squirrel Flu Testing In a forest, a group of squirrels is concerned about a new illness called “Squirrel Flu.” It’s more dangerous than the ordinary… 

Permutation

A permutation refers to the arrangement of objects in a specific order. The order of arrangement is important in permutation. A permutation let us know how many different ways a set or number of things… 

Examples of Exponential distribution

The exponential distribution is commonly used to model the time between events in a Poisson process. It is defined by a single parameter, , which is the rate parameter. The probability density function (PDF) of… 

denoising via dimension reduction in python

Dimension reduction methods like Principal Component Analysis (PCA) or Singular Value Decomposition (SVD) can be used for denoising data because they work by retaining the most important features (or dimensions) that capture the majority of… 

Missing data analysis: where’s your missing piece?

Why missing data occurs can be attributed to various reasons, including human error, malfunctioning equipment, or even intentional omission. It is important to handle missing data because it can significantly impact the reliability and accuracy… 

Imputation using SoftImpute in python

SoftImpute is a matrix completion algorithm in Python that allows you to fill in missing data in your dataset. This method is based on Singular Value Decomposition (SVD) and Iterative Soft Thresholding. Here’s a basic… 

K-Nearest Neighbors (KNN) imputation in sklearn

K-Nearest Neighbors (KNN) imputation is another method to handle missing data. It uses the ‘k’ closest instances (rows) to each instance that contains any missing values to fill in those values. In sklearn, you can… 

A comic guide to mean/median/mode imputation & Python codes

Handling missing data is a common preprocessing task in machine learning. In scikit-learn, you can handle missing data by using imputation techniques provided by the SimpleImputer class or by employing other strategies like dropping rows/columns with missing… 

SVD for dimension reduction

Singular Value Decomposition (SVD) is a powerful matrix decomposition technique that generalizes the concept of eigenvalue decomposition to non-square matrices. Eigenvalue decomposition specifically decomposes a square matrix into its constituent eigenvalues and eigenvectors. This decomposition… 

test for outliers in multivariate data in Python

To test for outliers in multivariate data in Python, you can use several libraries like numpy, scipy, pandas, sklearn, etc. Here’s how you can do it: Mahalanobis distance using Scipy library The Mahalanobis distance is a statistical measure used… 

error: Content is protected !!