Normalization & z-score
⭐ Z‑Score A z‑score tells you how many standard deviations an observation is from the mean. What it does Example Population mean , standard deviation .What is the z‑score of ? Interpretation:The value is 1.5…
Histogram versus density
⭐ Histogram vs. Density Plot Both visualize distributions, but they answer slightly different questions and behave differently. 📊 Histogram A histogram groups data into bins and shows counts (or proportions) in each bin. Key features…
geometric distribution is memoryless
A random variable that follows a geometric distribution satisfies: This means: The probability you still have to wait more trials does NOT depend on how long you’ve already been waiting. Your past failures don’t change…
geometric distribution
The geometric distribution models the number of trials needed until the first success occurs in a sequence of independent Bernoulli trials (like repeated coin flips). Think of it as the math of “How long until…
Binomial distribution
⭐ Binomial Distribution The binomial distribution models the number of successes in a fixed number of independent trials, where each trial has the same probability of success. Think of it as the math of “How…
expectation is linear
The expected value (mean) of random variables adds even if the variables are dependent. This is the magic part: Expectation is always linear — no independence required. Formally, for any random variables and : And…
histogram
A histogram is a graph that shows how data are distributed by grouping values into bins (intervals) and showing how many observations fall into each bin. It’s perfect for visualizing: Think of it as stacking…
probability density
A probability density function describes the distribution of a continuous random variable. If is continuous, its PDF is a function such that: The key idea For continuous variables: The PDF is not a probability. Probability…
probability mass function
A probability mass function is a function that gives the probability of each individual value of a discrete random variable. If is a discrete random variable, then its PMF is: It tells you: A PMF…
Types of random variables
Most random variables fall into two big categories: Everything else is a refinement of these two. 🎯 1. Discrete Random Variables A discrete random variable takes countable values — usually integers. Key features Examples Common…
independent vs mutually exclusive
Independent vs. Mutually Exclusive 🎯 Mutually Exclusive (Disjoint) Events Two events are mutually exclusive if they cannot happen at the same time. Example:Rolling a die: 🎯 Independent Events Two events are independent if knowing one…
general addition rule
⭐ General Addition Rule The general addition rule tells you how to find the probability that A or B happens — even when the events overlap. 📌 The formula Why subtract the intersection?Because if A…
Independent event
⭐ Independent Events Two events are independent when one happening does not change the probability of the other. That’s the whole heart of it. 📌 Formal definition Events and are independent if This equation is…
Complementary events
⭐ Complementary Events Two events are complements when they cover the entire sample space together and cannot happen at the same time. If is an event, then its complement is: 📌 Key properties 🎯 Examples…
Sample and Event
⭐ Sample vs. Event Think of probability as a story with two levels: 🎯 Sample Space (S) The sample space is the complete list of everything that could happen in an experiment. Examples Key idea…
Downsampling Techniques for Custom Batches in training Machine Learning Models
Downsampling techniques can be used to create custom batches for training a machine learning model. These custom batches can be tailored to specific needs, such as improving training efficiency, handling imbalanced data, or focusing on…
downsampling for hyperparameter tuning
Downsampling for hyperparameter tuning reduces the dataset size to speed up model training and experimentation while preserving key data characteristics. Here’s a concise overview: Why Downsample for Hyperparameter Tuning? Key Considerations Practical Steps Pitfalls to…
How to reduce catastrophic forgetting when fine-tuning neural networks
Catastrophic forgetting occurs when a neural network “overwrites” what it learned in a previous task while training on a new one. This happens because the weights optimized for the first task are changed to minimize…
Textbooks Are All You Need
The central idea of the paper “Textbooks Are All You Need” is that data quality is significantly more important than data quantity or model size when training large language models (LLMs), particularly for code generation.…
The “Speed” Myth of sloths
Baby sloths are born with their eyes open, all their teeth, and the ability to climb immediately.



























