Sum of Squares & coefficients of determination with Python & R codes

Subscribe to get access

??Subscribe to read the rest of the comics, the fun you can’t miss ??

The coefficient of determination, commonly known as (R^2), is a measure of how well the model explains the variability of the response variable. It ranges from 0 to 1, where 1 indicates that the model perfectly explains the variance, and 0 indicates that the model does not explain any variance.

Here’s a step-by-step example of computing the coefficient of determination in Python, where we’ll use scikit-learn to perform linear regression and compute (R^2):

import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
import matplotlib.pyplot as plt

# Generate sample data
np.random.seed(42)
X = np.linspace(0, 10, 100).reshape(-1, 1)  # Feature
y = 3 * X.flatten() + np.random.normal(0, 1, 100)  # Target with some noise

# Fit the model
model = LinearRegression()
model.fit(X, y)

# Predict
y_pred = model.predict(X)

# Compute R^2 score
r2 = r2_score(y, y_pred)
print(f"Coefficient of Determination (R^2) using scikit-learn: {r2:.2f}")

# Plotting
plt.scatter(X, y, color='blue', label='Data')
plt.plot(X, y_pred, color='red', linewidth=2, label='Fitted line')
plt.xlabel('X')
plt.ylabel('y')
plt.title('Linear Regression and R^2')
plt.legend()
plt.show()

R codes

# Load the ggplot2 package
library(ggplot2)

# Generate sample data
set.seed(42)
X <- seq(0, 10, length.out = 100)
y <- 3 * X + rnorm(100)  # Target with some noise
df <- data.frame(X = X, y = y)

# Fit the linear model
model <- lm(y ~ X, data = df)

# Summary of the model to get R^2
summary(model)

# Extract R^2
r_squared <- summary(model)$r.squared
cat("Coefficient of Determination (R^2) using ggplot2: ", round(r_squared, 2), "\n")

# Plotting using ggplot2
ggplot(df, aes(x = X, y = y)) +
  geom_point() +
  geom_smooth(method = "lm", color = "red", se = FALSE) +
  ggtitle("Linear Regression and R^2") +
  xlab("X") +
  ylab("y") +
  annotate("text", x = 1, y = max(df$y), label = paste("R^2 =", round(r_squared, 2)), hjust = 0)

The output is

So you can see that the coefficient of determination is high and the plot indicates a good fit.


Discover more from Science Comics

Subscribe to get the latest posts sent to your email.

Leave a Reply

error: Content is protected !!