When dealing with large samples, usually consisting of 30 or more data points, the z-test can be applied with minimal constraints. However, if outliers are present and their removal is not justified, it is recommended to conduct the hypothesis test both with and without the outliers to assess their impact. If the conclusion is affected, consider using an alternative method or gathering a new sample.
Also, in the case of moderate-sized samples, typically 15-30 observations, the z-test can be utilized unless the dataset contains outliers or if the variable being analyzed significantly deviates from a normal distribution. However,
1. For small samples, typically fewer than 15 data points, it is advisable to employ the z-test if the variable in question follows a normal distribution or closely approximates it.
2. If outliers are present, but their removal is justified and results in a dataset suitable for the z-test, then you may proceed with the z-test.
3. Some transformations can be applied to turn the data into approximately normal, such as the log transform, square root transform, Box-Cox transform, etc.
4. A commonly used graphical tool for assessing normality is the Q-Q plot
5. Some tests to assess univariate normality can be
Shapiro–Wilk test,
Pearson’s chi-squared test,
Kolmogorov–Smirnov test,
D’Agostino’s K-squared test,
Jarque–Bera test,
Anderson–Darling test,
Cramér–von Mises criterion,
6. Suppose you wish to conduct a hypothesis test for a population mean with a small sample, but your preliminary data analysis reveals either the presence of outliers or a departure from normal distribution. In such cases, neither the z-test nor the t-test is suitable. However, under specific conditions, you can employ a nonparametric method. For instance, if the variable in question exhibits a symmetric distribution, you can utilize the Wilcoxon signed-rank test to conduct a hypothesis test for the population mean.
It’s worth noting that most nonparametric methods do not rely on approximate normality, are robust against outliers and extreme values, and can be applied regardless of sample size. However, parametric methods like the z-test and t-test tend to yield more accurate results when the assumptions of normality and other prerequisites are met.
Discover more from Science Comics
Subscribe to get the latest posts sent to your email.