Simple linear regression is a statistical method used to model and analyze the relationship between two continuous variables. Specifically, it aims to predict the value of one variable (the dependent or response variable) based on the value of another variable (the independent or predictor variable). The relationship is assumed to be linear, meaning it can be described with a straight line.

The basic form of the simple linear regression equation is:
Where:
is the dependent variable (response).
is the independent variable (input).
is the intercept (the value of
when
is 0).
is the slope (the change in
for a one-unit change in
).
So, the relationship between the dependent variable and the independent variable
is often modeled as:
where represents the error term (the difference between the observed value and the predicted value).
For each data point , the residual is the difference between the observed value
and the predicted value
:
The sum of squared residuals (SSR) is the total of the squared differences between observed values and the values predicted by a model, used to measure the model’s accuracy.
where is the number of data points.
The method of least squares is to find the values of (intercept) and
(slope) that minimize the sum of the squared residuals. The goal is to minimize the sum of the squared differences between the observed values and the values predicted by the regression line, i.e, minimize SSR.
By taking the partial derivatives of the SSR with respect to and
and setting them to zero, we get the normal equations, which are then solved to find the optimal values of
and
:
The resulting values of and
provide the coefficients for the best-fitting line. The estimates are:
and
where:
and
are the individual sample points.
and
are the means of the
and
samples, respectively.
Discover more from Science Comics
Subscribe to get the latest posts sent to your email.