K
Does anyone know why most tests of means involving the t-statistic assume that the individual data values are normally distributed?
The t-statistic in the most common tests of means involves only the mean and its standard error [(s/sqrt
], and not the individual values themselves.
The Central Limit Theorem says that means will tend toward follow a normal distribution as the sample size gets larger, regardless of the distribution of the individual data values. So, assuming sample sizes are largish the distribution of the individual data values shouldn't really matter.
I started wondering if the sample standard deviation calculated from a highly skewed distribution resulted in the proper standard error for the respective distribution of the means.
I ran some simulations using Crystal Ball and was pleased to find that the sample standard deviation divided by SQRT
does indeed provide a very good estimate of the distribution of the respective means. Pretty Cool.
I also ran some simulations to get a feel for how big the sample sizes need to be for normality of the means. Here are the results:
If the individual data distribution is symmetric (I used the Uniform distribution to represent a symmetric nonnormal distr.) then even a fairly small sample size (n=5) resulted in means which are quite normally distributed (10,000 runs not significantly different from normal using MINITAB's Anderson-Darling test).
If the individual data distribution is not symmetric (I used an Exponential distribution which is very skewed right) then even a fairly large sample size (n=50) resulted in means which are only marginally normally distributed (10,000 runs give fairly straight prob. plot, but MINITAB's Anderson-Darling test P-Value = 0.000)
The t-statistic in the most common tests of means involves only the mean and its standard error [(s/sqrt
The Central Limit Theorem says that means will tend toward follow a normal distribution as the sample size gets larger, regardless of the distribution of the individual data values. So, assuming sample sizes are largish the distribution of the individual data values shouldn't really matter.
I started wondering if the sample standard deviation calculated from a highly skewed distribution resulted in the proper standard error for the respective distribution of the means.
I ran some simulations using Crystal Ball and was pleased to find that the sample standard deviation divided by SQRT
I also ran some simulations to get a feel for how big the sample sizes need to be for normality of the means. Here are the results:
If the individual data distribution is symmetric (I used the Uniform distribution to represent a symmetric nonnormal distr.) then even a fairly small sample size (n=5) resulted in means which are quite normally distributed (10,000 runs not significantly different from normal using MINITAB's Anderson-Darling test).
If the individual data distribution is not symmetric (I used an Exponential distribution which is very skewed right) then even a fairly large sample size (n=50) resulted in means which are only marginally normally distributed (10,000 runs give fairly straight prob. plot, but MINITAB's Anderson-Darling test P-Value = 0.000)