Marriage and Divorce Rate (Correlation Test)

Suppose we want to investigate whether there is a relationship between the marriage rate and the divorce rate in different regions. We can use the Pearson correlation coefficient to test for a linear relationship between these two variables. Here is how we set up the correlation test and perform the analysis.

Step 1: State the Hypotheses

The null hypothesis H_0 and the alternative hypothesis H_A can be stated as follows:
H_0: \rho = 0 \quad \text{(no linear relationship between marriage rate and divorce rate)}
H_A: \rho \neq 0 \quad \text{(there is a linear relationship between marriage rate and divorce rate)}
where \rho is the population correlation coefficient.

Step 2: Collect Data

Suppose we have the following sample data for the marriage rate and divorce rate (per 1000 people) in 10 different regions:

Step 3: Calculate the Correlation Coefficient

The Pearson correlation coefficient r is calculated using the formula:
r = \frac{n \sum (XY) - \sum X \sum Y}{\sqrt{[n \sum X^2 - (\sum X)^2][n \sum Y^2 - (\sum Y)^2]}}
where X represents the marriage rate and Y represents the divorce rate.

r = \frac{10(8.2 \cdot 4.3 + 7.8 \cdot 4.1 + ... + 8.4 \cdot 4.4) - (8.2 + 7.8 + ...+ 8.4)(4.3 + 4.1 +...+ 4.4)}{\sqrt{[10(8.2^2 + 7.8^2 +...+ 8.4^2) - (8.2 + 7.8 +... + 8.4)^2][10(4.3^2 + 4.1^2 + ...+ 4.4^2) - (4.3 + 4.1 + ... + 4.4)^2]}}

After calculating, we find that:
r \approx 0.978

Step 4: Determine the Degrees of Freedom

The degrees of freedom for the correlation test is:
df = n - 2 = 10 - 2 = 8

Step 5: Compare the Test Statistic to the Critical Value

Using a t-table or calculator, find the critical t-value for a two-tailed test with \alpha = 0.05 and df = 8 . The critical t-value is approximately \pm 2.306 .

We convert the correlation coefficient to a t-statistic:
t = \frac{r \sqrt{n-2}}{\sqrt{1-r^2}} = \frac{0.978 \sqrt{10-2}}{\sqrt{1-0.978^2}} \approx 11.33

Since t = 11.33 is greater than 2.306 , we reject the null hypothesis.

Conclusion

At the \alpha = 0.05 significance level, we have enough evidence to conclude that there is a significant linear relationship between the marriage rate and the divorce rate in different regions. Therefore, we reject the null hypothesis that there is no linear relationship between these two variables.

Codes

python

import numpy as np
from scipy import stats

# Data
marriage_rate = np.array([8.2, 7.8, 9.0, 6.9, 7.4, 8.6, 7.1, 8.0, 7.3, 8.4])
divorce_rate = np.array([4.3, 4.1, 4.8, 3.7, 3.9, 4.5, 3.8, 4.2, 3.9, 4.4])

# Calculate Pearson correlation coefficient and p-value
correlation_coefficient, p_value = stats.pearsonr(marriage_rate, divorce_rate)

# Calculate degrees of freedom
n = len(marriage_rate)
degrees_of_freedom = n - 2

# Calculate t-statistic
t_statistic = correlation_coefficient * np.sqrt(degrees_of_freedom / (1 - correlation_coefficient**2))

# Print results
print(f"Pearson correlation coefficient: {correlation_coefficient}")
print(f"p-value: {p_value}")
print(f"t-statistic: {t_statistic}")
print(f"Degrees of freedom: {degrees_of_freedom}")

R

# Data
marriage_rate <- c(8.2, 7.8, 9.0, 6.9, 7.4, 8.6, 7.1, 8.0, 7.3, 8.4)
divorce_rate <- c(4.3, 4.1, 4.8, 3.7, 3.9, 4.5, 3.8, 4.2, 3.9, 4.4)

# Calculate Pearson correlation coefficient and p-value
correlation_test <- cor.test(marriage_rate, divorce_rate)

# Calculate degrees of freedom
n <- length(marriage_rate)
degrees_of_freedom <- n - 2

# Calculate t-statistic
t_statistic <- correlation_test$estimate * sqrt(degrees_of_freedom / (1 - correlation_test$estimate^2))

# Print results
cat("Pearson correlation coefficient:", correlation_test$estimate, "\n")
cat("p-value:", correlation_test$p.value, "\n")
cat("t-statistic:", t_statistic, "\n")
cat("Degrees of freedom:", degrees_of_freedom, "\n")


Discover more from Science Comics

Subscribe to get the latest posts sent to your email.

Leave a Reply

error: Content is protected !!