Multiple regression analysis: waiting time to log in to Windows

Multiple regression analysis can be used to understand the relationship between the waiting time to log in to Windows (dependent variable) and several independent variables. Let’s assume we have the following independent variables:

Number of startup applications: The number of applications that start automatically when Windows boots up.
System RAM (in GB): The amount of RAM installed in the system.
Processor speed (in GHz): The speed of the system’s processor.
Disk speed (in MB/s): The speed of the system’s hard drive or SSD.

Suppose that we have a toy dataset like this:

Waiting Time (s)	Startup Applications	RAM (GB)	Processor Speed (GHz)	Disk Speed (MB/s)
45	10	8	2.5	150
30	5	16	3.0	500
60	15	4	2.0	100
25	3	8	3.5	250
40	8	8	3.0	200

Multiple Regression Analysis

The general form of the multiple regression equation is:

$\text{Waiting Time} = \beta_0 + \beta_1 \times \text{Startup Applications} + \beta_2 \times \text{RAM} + \beta_3 \times \text{Processor Speed} + \beta_4 \times \text{Disk Speed} + \epsilon$

Where:

$\beta_0$ is the intercept,
$\beta_1, \beta_2, \beta_3,$ and $\beta_4$ are the coefficients of the independent variables,
$\epsilon$ is the error term.

Matrix Form Representation:

The multiple regression model can be represented in matrix form as:

$\mathbf{Y} = \mathbf{X} \mathbf{\beta} + \mathbf{\epsilon}$

Where:

$\mathbf{Y}$ is the vector of the dependent variable (Waiting Time).
$\mathbf{X}$ is the matrix of independent variables (including the intercept term).
$\mathbf{\beta}$ is the vector of coefficients.
$\mathbf{\epsilon}$ is the vector of errors.

Given the dataset:

$\mathbf{Y} = \begin{pmatrix}45 \\30 \\60 \\25 \\40\end{pmatrix},$
$\mathbf{X} = \begin{pmatrix}1 & 10 & 8 & 2.5 & 150 \\1 & 5 & 16 & 3.0 &500\\1 & 15 & 4 & 2.0 & 100 \\1 & 3 & 8 & 3.5 & 250 \\1 & 8 & 8 & 3.0 & 200 \end{pmatrix}$

Computing the Coefficients:

The coefficients can be computed using the Normal Equation:

$\mathbf{\beta} = (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{Y}$

Let’s compute this using Python:

import numpy as np
# Define the dependent variable vector Y
Y = np.array([45, 30, 60, 25, 40])
# Define the independent variable matrix X
X = np.array([
    [1, 10, 8, 2.5, 150],
    [1, 5, 16, 3.0, 500],
    [1, 15, 4, 2.0, 100],
    [1, 3, 8, 3.5, 250],
    [1, 8, 8, 3.0, 200]
])
# Compute the coefficients using the Normal Equation
beta = np.linalg.inv(X.T @ X) @ X.T @ Y
beta

The computed coefficients $\mathbf{\beta}$ are:

$\mathbf{\beta} = \begin{pmatrix} 3.33333333 \\3.33333333 \\1.74527059 \times 10^{-13} \\3.33333333 \-5.86336535 \times 10^{-16}\end{pmatrix}$

Interpreting the coefficients:

Intercept ( $\beta_0$ ): $3.33333333$
Startup Applications ( $\beta_1$ ): $3.33333333$
RAM ( $\beta_2$ ): $1.74527059 \times 10^{-13}$ (approximately zero, indicating no significant effect)
Processor Speed ( $\beta_3$ ): $3.33333333$
Disk Speed ( $\beta_4$ ): $-5.86336535 \times 10^{-16}$ (approximately zero, indicating no significant effect)

This indicates that the waiting time to log in to Windows is significantly affected by the number of startup applications and the processor speed, while the a mount of RAM and disk speed do not show a significant effect in this model.

Python Example using Statsmodels

Here’s an example of how to perform this regression analysis in Python:

import pandas as pd
import statsmodels.api as sm
# Sample data
data = {
    'Waiting Time': [45, 30, 60, 25, 40],
    'Startup Applications': [10, 5, 15, 3, 8],
    'RAM': [8, 16, 4, 8, 8],
    'Processor Speed': [2.5, 3.0, 2.0, 3.5, 3.0],
    'Disk Speed': [150, 500, 100, 250, 200]
}
df = pd.DataFrame(data)
# Define the dependent and independent variables
X = df[['Startup Applications', 'RAM', 'Processor Speed', 'Disk Speed']]
y = df['Waiting Time']
# Add a constant to the independent variables
X = sm.add_constant(X)
# Fit the regression model
model = sm.OLS(y, X).fit()
# Print the model summary
print(model.summary())

Interpreting Results

The output will provide various statistics, including the coefficients ( $\beta$ values), p-values, R-squared value, and more. The coefficients indicate the expected change in the waiting time for a one-unit change in the respective independent variable, holding all other variables constant.

R-squared: Indicates how well the independent variables explain the variation in the dependent variable.
Coefficients: Represent the magnitude and direction of the relationship between each independent variable and the dependent variable.
P-values: Help determine the statistical significance of each coefficient. A common threshold for significance is 0.05.

Discover more from Science Comics

Subscribe to get the latest posts sent to your email.

Multiple regression analysis: waiting time to log in to Windows

Multiple Regression Analysis

Python Example using Statsmodels

Interpreting Results

Like this:

Related

Discover more from Science Comics

Like this:

Like this:

Like this:

Leave a ReplyCancel reply

Multiple Regression Analysis

Python Example using Statsmodels

Interpreting Results

Share this:

Like this:

Related

Discover more from Science Comics

Related Posts

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Leave a ReplyCancel reply