Non-Normal Data - Transforming the Data to Normal

J

Jon O

Normalizing Data

Hello All,

If a group of data is non-normal and we want to try to transform the data to normal, please explain some of the tools being used for the transformation.

In working with some of our six sigma customers, one method they are using on a regular basis is the central limit therom. Basically grouping 2 to 3 data points and taking the average and then calculating normality and capability on the average data. \

What are the thoughts of other cove members on this method?

Any input would be appreciated.

Regards,

Jon

R

Rick Goodson

Jon O,

It is difficult to comment on your question with out more information. How did you ascertain the data is non-normal? Did you use a standard test for normality or a graphical approach? What type of process is the data taken from? A little more information on the process would help. Nevertheless, a note on the transformation based on the Central Limit Theorem.

The Central Limit Theorem states that irrespective of the shape of the distribution of a universe, the distribution of average values, X-bar's, of subgroups of size n drawn from that universe will tend toward a normal distrbution as the subgroup size n grows without bound. The value of n does not have to be very large before the normal distribution may be applied. However, this is very useful in analyzing probabilities but does not form the basis for control charts with +/- 3 sigma limits (reference Statistical Quality Control by Grant and Leavenworth, seventh edition).

There are a number of data transformation methods available. You might consider the Weibull distribution as it is applicable to a wide variety of variations patterns including departures from both the normal and exponential distributions.

J

Jon O

Rick,

Thanks for the quick response. The assumption of normality was ascertained from performing an Anderson-Darling test for normality and reviewing the P-value. The data being analyzed is from an insertion/force test.

Nevertheless, do you feel it o.k. to calculate a capability on data set where the CLT has been applied?

Is there any point where youdon't go any further with the CLT? Averaging 2,3,4 datapoints....where do you stop???

Should CLT be considered a first round tool to normalize data or are we using statistics to fudge a data set that is truely messed up?

Thanks,

Jon

J

jasshe

minitab r14 help us to transform the data to follow a normal distribution by at least the following two ways:

1.Box-Cox transformation
2. Johnson transformation

D

Darius

As I said, my own favorite is box-cox (even as regresion tool, it's the first regresion model (with the hiperbolic model) that I use when I see a good behabiur on the non lineal curve and when I try to find an equation for a curve found in a book), but for capability I stick on non parametrical (median and percentile).

I get woried, when I try to understand the meaning of capability when transformations are involved. Of course the specs can be transformed too but...., I feel something get missing.

J

jasshe

Darius said:
As I said, my own favorite is box-cox (even as regresion tool, it's the first regresion model (with the hiperbolic model) that I use when I see a good behabiur on the non lineal curve and when I try to find an equation for a curve found in a book), but for capability I stick on non parametrical (median and percentile).

I get woried, when I try to understand the meaning of capability when transformations are involved. Of course the specs can be transformed too but...., I.

happy to discuss "box-cox"with all of you,but i do not know " feel something get missing" mean what,
you did not get enough information about you procss capability from this methods ?

D

Darius

As I said, the best capability index is obtained by box-cox as is shown in the article of Quality and Reliability Engineering International
"COMPUTING PROCESS CAPABILITY INDICES FOR NON-NORMAL DATA: A REVIEW AND COMPARATIVE STUDY" by Loon Ching Tang and Su Ee Than

The greatest comparisson of capability index I ever seen.

But: as Wheeler wrote in Advanced topics on SPC,

"If the users have not already developed the ability to handle the mathematical abstraction of thinking about transforming a given set of data in different ways, the introduction of transformations as part of a statistical analysis will tend to confuse the results and confound the user"

".. and the treatment effects must be transformed back. This inverse transformation will present difficulties of interpretation that will overhelm many"

"The best analysis is that analysis wich provides the greatest insight with the simplest technique."

If the process capability is going to be presented to somebody else, the use of transformations as D. Wheeler pointed out, could make the results difficult to interpretate.