Simple linear regression using train-test split in Python & R

An example of performing simple linear regression using train-test split where the process is as follows,

1. Generate a synthetic dataset:

We create 100 data points X uniformly distributed between 0 and 2.
The target values y are generated using a linear relationship y = 4 + 3X with some added Gaussian noise for realism.

2. Split the dataset:

We use train_test_split to divide the data into training and testing sets, with 80% for training and 20% for testing.

3. Create and train the model:

We create a LinearRegression model instance and train it using the training data.

4. Make predictions and evaluate the model:

The model makes predictions on the test set, and we compute the Mean Squared Error (MSE) and Mean Absolute Error (MAE) to evaluate the model’s performance.

5. Plot the results:

We visualize the actual vs. predicted values to see how well the model has learned the relationship.

Codes in Python

import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error

# Generate a synthetic dataset
# Let's create a simple linear dataset with some noise
np.random.seed(0)
X = 2 * np.random.rand(100, 1)  # 100 data points
y = 4 + 3 * X + np.random.randn(100, 1)  # y = 4 + 3X + noise

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create the linear regression model
model = LinearRegression()

# Train the model
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)

print(f'Mean Squared Error: {mse}')
print(f'Mean Absolute Error: {mae}')

# Plot the results
plt.scatter(X_test, y_test, color='blue', label='Actual values')
plt.plot(X_test, y_pred, color='red', linewidth=2, label='Predicted values')
plt.title('Simple Linear Regression')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.show()

Codes in R

We first install the neccesary packages if you haven’t got it installed

# Install caTools
install.packages("caTools")

# Install ggplot2
install.packages("ggplot2")

Then, the codes is as follows

# Load necessary libraries
library(caTools)
library(ggplot2)

# Generate a synthetic dataset
set.seed(0)
X <- 2 * runif(100)  # 100 data points uniformly distributed between 0 and 2
y <- 4 + 3 * X + rnorm(100)  # y = 4 + 3X + noise

# Combine X and y into a data frame
data <- data.frame(X, y)

# Split the dataset into training and testing sets
set.seed(42)
split <- sample.split(data$y, SplitRatio = 0.8)
train_data <- subset(data, split == TRUE)
test_data <- subset(data, split == FALSE)

# Create the linear regression model
model <- lm(y ~ X, data = train_data)

# Make predictions on the test set
y_pred <- predict(model, newdata = test_data)

# Evaluate the model
mse <- mean((test_data$y - y_pred)^2)
mae <- mean(abs(test_data$y - y_pred))

cat('Mean Squared Error:', mse, '\n')
cat('Mean Absolute Error:', mae, '\n')

# Plot the results
ggplot() +
  geom_point(aes(x = test_data$X, y = test_data$y), color = 'blue', label = 'Actual values') +
  geom_line(aes(x = test_data$X, y = y_pred), color = 'red', size = 1, label = 'Predicted values') +
  ggtitle('Simple Linear Regression') +
  xlab('X') +
  ylab('y') +
  theme_minimal()

Discover more from Science Comics

Subscribe to get the latest posts sent to your email.

Simple linear regression using train-test split in Python & R

Codes in Python

Codes in R

Like this:

Related

Discover more from Science Comics

Like this:

Like this:

Like this:

Leave a ReplyCancel reply

Codes in Python

Codes in R

Share this:

Like this:

Related

Discover more from Science Comics

Related Posts

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Leave a ReplyCancel reply