Simple Linear Regression & Least square method

Simple linear regression is a statistical method used to model and analyze the relationship between two continuous variables. Specifically, it aims to predict the value of one variable (the dependent or response variable) based on the value of another variable (the independent or predictor variable). The relationship is assumed to be linear, meaning it can be described with a straight line.

The basic form of the simple linear regression equation is:

Y = a + bX

Where:

  • Y is the dependent variable (response).
  • X is the independent variable (input).
  • a is the intercept (the value of Y when X is 0).
  • b is the slope (the change in Y for a one-unit change in X ).

So, the relationship between the dependent variable Y and the independent variable X is often modeled as:
Y = a + bX + \epsilon
where \epsilon represents the error term (the difference between the observed value and the predicted value).

For each data point (x_i, y_i), the residual is the difference between the observed value y_i and the predicted value \hat{y}_i :
\text{Residual} = y_i - \hat{y}_i = y_i - (a + bx_i)

The sum of squared residuals (SSR) is the total of the squared differences between observed values and the values predicted by a model, used to measure the model’s accuracy.
\text{SSR} = \sum_{i=1}^{n} (y_i - \hat{y}i)^2 = \sum{i=1}^{n} (y_i - (a + bx_i))^2
where n is the number of data points.

The method of least squares is to find the values of a (intercept) and b (slope) that minimize the sum of the squared residuals. The goal is to minimize the sum of the squared differences between the observed values and the values predicted by the regression line, i.e, minimize SSR.

By taking the partial derivatives of the SSR with respect to a and b and setting them to zero, we get the normal equations, which are then solved to find the optimal values of a and b :
\frac{\partial}{\partial a} \text{SSR} = 0
\frac{\partial}{\partial b} \text{SSR} = 0

The resulting values of a and b provide the coefficients for the best-fitting line. The estimates are:

b = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2}
and
a = \bar{y} - b\bar{x}
where:

  • x_i and y_i are the individual sample points.
  • \bar{x} and \bar{y} are the means of the X and Y samples, respectively.

Discover more from Science Comics

Subscribe to get the latest posts sent to your email.

Leave a Reply

error: Content is protected !!