Transforming or not Transforming - Dealing with Non-Normal Data

R

rafael_josem

Hello Guys,

Lately, I have been doing some research about what to do when you have non-normal data for control charting purposes. I have found out that some say that it doesnt matter, others say that you should transform your data.

Can anyone help me to clarify this? I just go confused :confused:

Thanks!
 

bobdoering

Stop X-bar/R Madness!!
Trusted Information Resource
Lately, I have been doing some research about what to do when you have non-normal data for control charting purposes. I have found out that some say that it doesnt matter, others say that you should transform your data.

Depends on what is causing your data to be non-normal, especially what kind of process it is. Transforming is used more in capability than control charting.
 
R

rafael_josem

Let's say that the process is non normal. This is very common, at least in my experience. I haven't seen any normal distribution yet.

So, for these cases, do you have to transform or is it ok to use the data as it is for control charting?

Also, I would appreciate any good references that you can provide me.
 

bobdoering

Stop X-bar/R Madness!!
Trusted Information Resource
You can find some basic information on non-normal SPC in Don Wheeler's book "Normality and the Process Behavior Chart". It is effective if your process variation is from one source, and independent.

If it was from a consistent variation, such as tool wear, the resource on how you handle that is my book "CorrectSPC - When 'normal' is not typical; A guide for precision machining statistical process control".

Neither reference uses transformation.

You may also have multi-modal distribution, as determined by the total variance equation. In this case you may need to spend time reducing some of the participating variances prior to SPC.

I agree, outside of "natural" variation, variation caused by excessive human intervention (running to the mean), or variation masked by significant measurement or gage error, the normal distribution is not common.
 
D

Darius

Don Wheeler in his book "Advanced topics on SPC" said that (sort of) the problem with transforming is that is that most of the times this technique is very difficult to understand from the point of view of the operators, so if you are losing "readability" of your charts is not recommended.

He also pointed out that no matter which distribution your data follows, the SPC will work OK (on the same book), and IMHO, this can be true, but beware of detection rules, they really are defined for Gaussian distribution ("Normal").

As some of this thread said, there are no prefect "Normal" distribution, in may experience..., if the value is affected by time most of the times will not be "Normal", check for the rules alarms, too many will make your chart a nice Christmas tree, but what is the use of a chart if don't tell you something to improve your process.
 

bobdoering

Stop X-bar/R Madness!!
Trusted Information Resource
Don Wheeler in his book "Advanced topics on SPC" said that (sort of) the problem with transforming is that is that most of the times this technique is very difficult to understand from the point of view of the operators, so if you are losing "readability" of your charts is not recommended.

This is a point that both I and Shewhart agree with him on in our books. "Rule No.1
Original data should be presented in a way that will preserve the evidence of the original data for all the predictions assumed to be useful."
-Dr. Walter A. Shewhart; Statistical Method from the Viewpoint of Quality Control

Transformation generally maskes the original evidence.

He also pointed out that no matter which distribution your data follows, the SPC will work OK (on the same book), and IMHO, this can be true, but beware of detection rules, they really are defined for Gaussian distribution ("Normal").

I agree, his point is particularly directed to the "signals" that traditional SPC will provide exist no matter what the distribution, but he does not test the validity of the WEC rules, most of which are designed to ensure the process is varying about the mean in a random manner.

As some of this thread said, there are no prefect "Normal" distribution, in my experience..., if the value is affected by time most of the times will not be "Normal", check for the rules alarms, too many will make your chart a nice Christmas tree, but what is the use of a chart if don't tell you something to improve your process.

Yes, that is the issue I deal with in my book. There are some natural variations. One example I use is loaves of bread coming out of an automated bakery oven. Most are a particular height - some a little higher, some a little lower. You would likely see a normal distribution there....unless it is Wonder Bread. Then they are exactly the same - and no one is sure how they do that! There are some heat treaters that would love to know the secret, though!
 
S

SPC_Newbie

I agree, his point is particularly directed to the "signals" that traditional SPC will provide exist no matter what the distribution, but he does not test the validity of the WEC rules, most of which are designed to ensure the process is varying about the mean in a random manner.

Do we have a list of detection rules for non-normal data?
Thx!
 

bobdoering

Stop X-bar/R Madness!!
Trusted Information Resource
Do we have a list of detection rules for non-normal data?

The correct answer to any question: it depends.

We know, for example, with tool wear on an OD an upward trend is totally acceptable. Number of points above and below the mean is irrelevant (because the correct curve is not random about the mean).

But, points in a row the wrong direction (e.g. downward for an OD) need attention.

So, yes, you need a different set of detection rules.
 
S

SPC_Newbie

Thanks - Right, I agree we need a different set and the one one 'rule' you gave, looking for points in a row in the wrong direction, is a great one. Have you/anyone else already compiled other/additional rules they'd like to share?

Maybe a different set exists for for each of the 'typical' histograms we see (skewed towards zero for parallelism, skewed towards an upper constraint for?, etc).

My questions are coming from the precision machining perspective -

THANKS!
 
Last edited:

Bev D

Heretical Statistician
Leader
Super Moderator
All rules come from the general probabilities of 'non-random patterns': shifts, trends and cycles.
Most of the time (across all industries; some industries will have a different distribution of process behavior than others - such as tool wear) the Western Electric Rules are sufficient. Remember the probabilities of a signal are not precise, nor were they intended to be. If a specific rule doesn't make sense for you don't use it.

Bob provides a good example of trending and tool wear.

Another example (that probably doesn't apply to your circumstance) I have is that we track customer complaints as a percent of our instrument install base. However, while this is a useful surrogate for the area of opportunity for a 'complaint', it isn't perfect. We would actually need to know how many times the instrument were used and in the case of some complaints wether or not the particualr test was run. Occasionally some disease states are seasonal and so teh area of opportunity can vary by season. This makes the rule of 4 out of 5 points beyond 1sd of the mean on the same side of the mean a false alarm too often to be useful. Many of our product compaints have a 'slow seasonal roll' so we turn this rule off.

I find that that the 'shift' rules are almost always applicable (1 point out, 2 out of 3 near the same limit and 8 in a row on the same side of the average) as long as the measurement system has sufficient resolution AND the appropriate chart is selected (for example 1 point above the upper limit for a c chart that an average count of less than .1 is silly becuase teh c chart is meant for such a low defect rate (the normal approximation to the Poisson doesn't hold at such low rates. Of course, the signal is correct - it is telling you you need a different chart, not a change to the process)

to not use certain rules or to invoke custom rules requires in depth knowledge of the process at hand, an understanding of the logic of SPC and a suspension of our natural tendancy to want to rationalize variation that we dont' want to work on.

You are looking non random patterns that would tell you that something has most likely changed. It is that simple and that complicated.
 
Top Bottom