I have seen this topic pop up several times regarding normality of data. I thought I would bounce it off of everybody to see what your thoughts are.
Parametric statistics (say the t-test) are making inferences regarding the mean. If the population is normal, then the mean and median are the same. If a set of data is non-normal, then the mean and median are not the same. A non-parametric test will make inferences based on the median.
If a set of data is ‘reasonably’ normal (through graphing; although no data is truly normal), then yes, parametric statistics are significantly more efficient and the proper choice. But if you have a small sample size, then normality cannot be assumed.
Yes, the Central Limit Theorem (C.L.T.) does come into play. But the theorem deals with averages. If you have 17 samples, it is difficult in my mind (unless the data shows normality) to assume normality, simply based on the C.L.T. Sure, I can average 7000 times from those 17 numbers (Bootstrapping), and guess what, normality! But I still, in reality, only have my 17 numbers.
Parametric statistics are easier and way more efficient. But unless your data shows normality, non-parametric statistics should be utilized. NCSS shows both parametric and non-parametric tests. If the two tests give you the same results, then nothing is loss. If you get different results, a little more digging may be in order to determine which set of assumptions holds truer.
How does everyone deal with non-normality? Or do you?
Parametric statistics (say the t-test) are making inferences regarding the mean. If the population is normal, then the mean and median are the same. If a set of data is non-normal, then the mean and median are not the same. A non-parametric test will make inferences based on the median.
If a set of data is ‘reasonably’ normal (through graphing; although no data is truly normal), then yes, parametric statistics are significantly more efficient and the proper choice. But if you have a small sample size, then normality cannot be assumed.
Yes, the Central Limit Theorem (C.L.T.) does come into play. But the theorem deals with averages. If you have 17 samples, it is difficult in my mind (unless the data shows normality) to assume normality, simply based on the C.L.T. Sure, I can average 7000 times from those 17 numbers (Bootstrapping), and guess what, normality! But I still, in reality, only have my 17 numbers.
Parametric statistics are easier and way more efficient. But unless your data shows normality, non-parametric statistics should be utilized. NCSS shows both parametric and non-parametric tests. If the two tests give you the same results, then nothing is loss. If you get different results, a little more digging may be in order to determine which set of assumptions holds truer.
How does everyone deal with non-normality? Or do you?