Simple Linear Regression Review: Sunlight & Selfie

Simple linear regression is a statistical method used to model and analyze the relationship between two continuous variables. Specifically, it aims to predict the value of one variable (the dependent or response variable) based on the value of another variable (the independent or predictor variable). The relationship is assumed to be linear, meaning it can be described with a straight line.

In simple linear regression, there is only one independent variable. The relationship between y and x is modeled by a straight line:

y = \beta_0 + \beta_1 x + \varepsilon

Where:

  • y is the dependent variable (the output).
  • x is the independent variable (the input).
  • \beta_0 is the intercept (the value of y when x = 0).
  • \beta_1 is the slope of the line (the change in y for a unit change in x).
  • \varepsilon is the error term (the difference between the observed and predicted values of y).

The formulas to estimate the slope (\beta_1) and intercept (\beta_0) are:

\beta_1 = \frac{ \sum (x_i - \bar{x})(y_i - \bar{y}) }{ \sum (x_i - \bar{x})^2 }

\beta_0 = \bar{y} - \beta_1 \bar{x}

Where:

  • \bar{x} is the mean of the x values.
  • \bar{y} is the mean of the y values.
  • x_i and y_i are the individual data points.

Example: Sunlight & Selfies

Sunlight (hours)12345
Number of Selfies510121520

Calculate the means:
\bar{x} = \frac{1 + 2 + 3 + 4 + 5}{5} = \frac{15}{5} = 3
\bar{y} = \frac{5 + 10 + 12 + 15 + 20}{5} = \frac{62}{5} = 12.4

Calculate the slope (\beta_1):
\beta_1 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n} (x_i - \bar{x})^2} \sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y}) = (1-3)(5-12.4) + (2-3)(10-12.4) + (3-3)(12-12.4) + (4-3)(15-12.4) + (5-3)(20-12.4) = (-2)(-7.4) + (-1)(-2.4) + (0)(-0.4) + (1)(2.6) + (2)(7.6) = 14.8 + 2.4 + 0 + 2.6 + 15.2 = 35 \sum_{i=1}^{n} (x_i - \bar{x})^2 = (1-3)^2 + (2-3)^2 + (3-3)^2 + (4-3)^2 + (5-3)^2 = 4 + 1 + 0 + 1 + 4 = 10 \beta_1 = \frac{35}{10} = 3.5

Calculate the intercept (\beta_0):
\beta_0 = \bar{y} - \beta_1 \bar{x} \beta_0 = 12.4 - 3.5 \times 3 = 12.4 - 10.5 = 1.9

The estimated linear regression model is:
y = 1.9 + 3.5x

Codes in Python

import numpy as np

# Given data
x = np.array([1, 2, 3, 4, 5])
y = np.array([5, 10, 12, 15, 20])

# Calculate the means of x and y
mean_x = np.mean(x)
mean_y = np.mean(y)

# Calculate the slope (beta_1)
numerator = np.sum((x - mean_x) * (y - mean_y))
denominator = np.sum((x - mean_x) ** 2)
beta_1 = numerator / denominator

# Calculate the intercept (beta_0)
beta_0 = mean_y - beta_1 * mean_x

print(f"The estimated linear regression model is: y = {beta_0:.2f} + {beta_1:.2f}x")

R codes:

# Given data
x <- c(1, 2, 3, 4, 5)
y <- c(5, 10, 12, 15, 20)

# Calculate the means of x and y
mean_x <- mean(x)
mean_y <- mean(y)

# Calculate the slope (beta_1)
numerator <- sum((x - mean_x) * (y - mean_y))
denominator <- sum((x - mean_x)^2)
beta_1 <- numerator / denominator

# Calculate the intercept (beta_0)
beta_0 <- mean_y - beta_1 * mean_x

cat("The estimated linear regression model is: y =", beta_0, "+", beta_1, "x\n")


Discover more from Science Comics

Subscribe to get the latest posts sent to your email.

Leave a Reply

error: Content is protected !!