Kurious Fox

Riemann sum

by Kurious Fox
October 4, 2024October 9, 2024

A Riemann sum is a method used in calculus to approximate the integral (or area under a curve) of a function. It is named after the German mathematician Bernhard Riemann. The basic idea behind a Riemann sum is to break up the region under a curve into small rectangles, compute the area of each rectangle, and then sum those areas to approximate the total area under the curve.

How to apply a function to a matrix in numpy efficiently

by Kurious Fox
October 3, 2024October 9, 2024

In NumPy, you can perform element-wise operations on matrices using vectorized operations, numpy.vectorize for custom functions, and numpy.apply_along_axis for applying functions along specific axes.

PySpark: selecting and accessing data

by Kurious Fox
October 1, 2024October 1, 2024

The content outlines various PySpark functions used for data manipulation in DataFrames. Key functions include filtering with where(), limiting rows with limit(), returning distinct rows, dropping columns, and grouping by criteria. Each function includes a brief example, illustrating how to access, modify, and aggregate data effectively within PySpark.

Integral: How to weight an elephant?

by Kurious Fox
September 29, 2024October 14, 2024

L??ng Th? Vinh, a renowned scholar, ingeniously weighed an elephant by using water displacement, showcasing his mathematical brilliance. His method relates to integration, which sums small quantities to determine total weight, underlying the connection between math and practical problem-solving.

PySpark data frame creation song

by Kurious Fox
September 23, 2024September 23, 2024

This song and example code help remember PySpark data frame creation functions easier. Key functions include creating Data Frames, displaying data, printing schemas, and filtering. The document facilitates understanding how to manipulate data effectively in PySpark, making it a useful reference for users working with large datasets.

pandas function song – grouping the data

by Kurious Fox
September 20, 2024September 20, 2024

This song and code examples help us understand and remember various Pandas functions for data manipulation, including grouping, aggregating, and transforming data. Key functions include groupby(), pivot_table(), resample(), rolling(), expanding(), cumsum(), cumprod(), cut(), qcut(), aggregate(), and transform().

normal distribution song

by Kurious Fox
September 20, 2024September 29, 2024

This song helps us better remember the properties of the normal distribution. A normal distribution, also known as a Gaussian distribution, is a symmetrical, bell-shaped continuous probability distribution characterized by its mean (?) and standard deviation (?). It exhibits properties such as symmetry, unimodality, and follows the 68-95-99.7 rule, indicating the distribution of data within standard deviations of the mean.

Pandas function song

by Kurious Fox
September 19, 2024September 19, 2024

A cute, catchy song on various Pandas functions applied to DataFrames. Key functions include sorting values, resetting the index, dropping columns and duplicates, sampling data, and handling missing values. Example codes illustrate each function’s output, demonstrating how to manipulate and visualize data effectively with Pandas.

Can sound crack glass?

by Kurious Fox
September 19, 2024September 20, 2024

Sounds can crack glass when they match its resonant frequency and are loud enough, typically over 100 decibels. Famous singers like Enrico Caruso and Ella Fitzgerald demonstrated this phenomenon. The process involves amplifying vibrations until the glass can no longer withstand the stress, leading to cracking or shattering.

PyTorch basic computation function song

by Kurious Fox
September 14, 2024September 14, 2024

The provided content showcases a series of PyTorch functions with descriptions and examples. Functions like torch.abs, torch.ceil, torch.floor, torch.clamp, torch.std, torch.prod, and torch.unique are explained with their respective use cases. These functions are fundamental for manipulating tensors in PyTorch.

PyTorch function song: linear algebra operations

by Kurious Fox
September 14, 2024September 14, 2024

The provided content showcases common linear algebra operations in PyTorch, including determinant calculation, matrix inverse, LU decomposition, QR decomposition, Cholesky decomposition, SVD, eigenvalue and eigenvector computation, matrix and vector norms, trace calculation, solving linear systems, and other operations with code and output examples.

PyTorch Tensor Creation song & examples

by Kurious Fox
September 13, 2024September 13, 2024

Tensor Creation: Example: Here are examples for each of the basic tensor creation functions in PyTorch: Output: Output: Output: Output: Output: Output: Output: Output: Output:

PyTorch function song & examples: Autograd, Random Number Generation, Loss Functions, optimization

by Kurious Fox
September 13, 2024September 13, 2024

Autograd: Random Number Generation: Loss Functions: Optimization: Examples for Autograd, Random Number Generation, Loss Functions, and Optimization in PyTorch: Autograd Output: Output: Output: Output: Random Number Generation Output: Loss Functions Output: Output: Optimization This function… PyTorch function song & examples: Autograd, Random Number Generation, Loss Functions, optimization

PyTorch function song & examples: Tensor Type & Device Management:

by Kurious Fox
September 13, 2024September 14, 2024

The provided content discusses tensor reshaping and tensor type and device management in PyTorch. It covers functions such as tensor.view(), tensor.reshape(), tensor.transpose(), tensor.squeeze(), tensor.unsqueeze(), tensor.to(), tensor.type(), tensor.is_cuda, tensor.cpu(), and tensor.cuda(). Demonstrated examples showcase effective memory management and computation, especially when utilizing GPUs.

PyTorch Tensor Operations song & examples

by Kurious Fox
September 13, 2024September 13, 2024

PyTorch Tensor Operations song & examples on element-wise addition, subtraction, multiplication, and division, matrix multiplication, as well as operations like sum, mean, max, min, concatenation, and stacking of tensors.

Support Vector Machine + Python & R Codes

by Kurious Fox
September 9, 2024October 15, 2024

Support Vector Classifier (SVC) is a powerful algorithm for classification tasks, capable of handling linear and non-linear data using different kernel functions. It efficiently handles high-dimensional data for applications like image recognition and bioinformatics. Python and R codes demonstrate SVM usage for binary classification with breast cancer and mtcars datasets, respectively.

K-Means Clustering Method & Python Codes

by Kurious Fox
September 8, 2024October 27, 2024

K-Means Clustering is a popular unsupervised machine learning algorithm used for clustering data into groups. It is widely used in various fields such as image processing, market segmentation, and document clustering. The algorithm works by… K-Means Clustering Method & Python Codes

Logistic regression with L1 or L2 penalty with codes in Python and R

by Kurious Fox
September 8, 2024December 8, 2024

Logistic regression with L1 or L2 penalty adds regularization to prevent overfitting and improve model generalization. L1 penalty (Lasso) encourages sparsity in the model, making it suitable for datasets with many irrelevant features. L2 penalty (Ridge) retains all features with reduced importance. Python and R codes demonstrate implementation and evaluation of these regression techniques.

What’s classification

by Kurious Fox
September 8, 2024October 27, 2024

Classification organizes items based on criteria. In data, it involves sorting into categories. It’s manual or automated with algorithms. Used in science, business, and technology to analyze and predict based on data. Crucial in document categorization, image recognition, sentiment analysis, and spam filtering for efficient data organization and analysis.

Adjusted R squared

by Kurious Fox
September 8, 2024October 14, 2024

The coefficient of determination, or R-squared, measures how well an independent variable explains the variability of a dependent variable in a regression model. Its limitation lies in the fact that it does not decrease when a new feature is added, whether useful or not. Adjusted R-squared is an improvement, considering the number of predictors in a model, making it more reliable for assessing explanatory power.

Feature selection & Model Selection

by Kurious Fox
September 8, 2024November 16, 2024

Feature selection involves identifying and including essential variables in the model, possibly leading to improved performance and interpretability. Adjusted R-squared is a common metric for regression analysis, addressing overfitting by penalizing unnecessary variables and offering an accurate model representation.

Sum of Squares & coefficients of determination with Python & R codes

by Kurious Fox
September 8, 2024November 16, 2024

The coefficient of determination (R-squared) measures how well a model explains the variance of the response variable. In this example, Python and R are used to calculate R-squared for linear regression. Higher R-squared value and the plot indicate a good fit, demonstrating the effectiveness of the model.

Non-constant variance in linear regression: a duck’s mood swing problem

by Kurious Fox
September 8, 2024November 16, 2024

This content provides an example of simulating and detecting heteroscedasticity in data using Python. We simulate the data, fit the model, and analyze how to detect heteroscedasticity, and how to address this using a log transformation.

Multiple linear regression

by Kurious Fox
September 8, 2024November 16, 2024

Multiple linear regression is a powerful tool for modeling relationships between multiple independent variables and a single dependent variable. Let’s take a look at some examples with codes in Python and R to demonstrate its practical application

Review: Maximum Likelihood Estimation

by Kurious Fox
September 7, 2024November 16, 2024

Maximum Likelihood Estimation (MLE) is a statistical method that estimates parameters by maximizing the likelihood function. For example, in a Poisson distribution, the MLE for the rate parameter ? is the sample mean. And here is the detailed derivation

Comparing forward, backward, stepwise feature selection

by Kurious Fox
September 7, 2024November 16, 2024

Forward selection adds features one by one, optimizing model performance but potentially missing the best subset. Backward selection starts with all features and removes the least significant, refining the model but being more computationally intensive. Stepwise selection combines both methods, adding or removing features for a balanced approach but can be complex.

Line and Bar Plot in the same graph with Error Bars

by Kurious Fox
September 7, 2024November 8, 2024

The codes to for this graph is as below, with the following keypoints: Legend Handling: The legend is constructed from both plots (line plot & bar plot), ensuring that all data series are labeled correctly.… Line and Bar Plot in the same graph with Error Bars

What does AI see about Asian- A Generative Image Experiment

by Kurious Fox
September 7, 2024November 27, 2024

In this experiment, I used Pikaso to generate 20 images with a provided command, with “AI prompt” on (that means Freepik AI will automatically improve short prompts). Why? The generative model learns the patterns from… What does AI see about Asian- A Generative Image Experiment

Hyperparameter tuning by train-validation-test split – process & example

by Kurious Fox
September 7, 2024November 16, 2024

implementing Lasso regression with train-validation-test split and finding the optimal regularization parameter. In Python, it involves splitting the data, training Lasso model with different alpha values, finding the best alpha, retraining the model, and evaluating on the test set. In R, it includes data splitting, training Lasso models, finding the best lambda, retraining, and testing.

Grid search and train-validation-test split for hyperparameter tuning – intro

by Kurious Fox
September 6, 2024November 16, 2024

The training-validation-test split involves using the training set to fit the model, the validation set to tune hyperparameters, and the test set to evaluate performance. Python’s scikit-learn library can be used for this process, ensuring the model generalizes well to new data by evaluating it on unseen data and avoiding overfitting.

A comic guide to underfitting

by Kurious Fox
September 6, 2024November 19, 2024

Underfitting in machine learning occurs when a model fails to capture underlying data patterns due to simplicity or insufficient training data. To address underfitting, select complex models, add features, and obtain more training data. Also, fine-tune hyperparameters and optimize the model’s architecture. Few features in a model can also cause underfitting, requiring the identification of relevant additional features or more advanced modeling techniques.

Evaluation measure: MSE versus MAE, RMSE

by Kurious Fox
September 6, 2024November 19, 2024

This comic explains MSE and MAE, the commonly used evaluation metrics for regression. MSE emphasizes large deviations, while MAE provides a more robust measure when outliers are less significant. MSE is preferred as a loss function due to its ability to penalize larger errors more heavily and its suitability for mathematical optimization, stability, and statistical interpretation. RMSE is the square root of MSE and also penalizes large errors.

« Previous
1
…
8
9
10
11
12
…
15
Next »