Non-normal Distributions in SPC - How do I Normalize Data?

bobdoering

Stop X-bar/R Madness!!
Trusted Information Resource
I have attached an analysis taken from Distribution Analyzer (from Taylor Enterprises, variation.com) You can see a comparison of the normal distribution, and the distribution that best fit your data ("Smallest Extreme Value Family"). The Smallest Extreme Value Family collects some of the lower values at the beginning of the run, but those may have been special causes (warm up?) that should be excluded. That being the case, and assuming there was no human intervention during the run, it appears that your data would be adequately estimated to be normal.

However, if it is normal because of routine adjustments during the run, or from measurement error (it will create a normal curve), then those issues need to be addressed. Remember, what you are looking at is Total Variation, which includes process variation, measurement variation and any outside variation (such as human adjustment, warm-up, etc.). You want all but the process variation to be minimized to statistically insignificant, then look at the resulting distribution.

At this point, unless further information (such as operator adjustment not recorded) was significant, I would say your X-bar-R is suitable for your process.
 

Attachments

  • distribution.doc
    113 KB · Views: 191
A

aproddutoor

Thanks Bob.

The data that I attached was for a lot size of 10000, it is usually run in 3 hours. To achieve randomness, using C=0 Sampling plans we see the number of samples to be inspected for the given lot size, in this case it is 50. I generate 50 random numbers from 1 and 10000 using Excel, so the operator collects data only at those random numbers. So the first sample of my data was collected after a while and I don't know why it measured low. When the machine is set-up we measure the first 5 consecutive parts and if they are all in, we start the run.

So should I still exclude that sample? There was no human intervention as well.

But when I entered the data in Minitab if you see the attachment that I sent you. The Normal Distribution Plot says that my p-value is .022 and you got a p-value of .0819. I don't understand why there is so much difference. Since the p-value was less than .05 I thought it is not a normal distribution. How did you get a different p-value from mine?

The software that we use also says that my distribution isn't normal. No one could be more happier if the data is really normal.

Thanks
 

bobdoering

Stop X-bar/R Madness!!
Trusted Information Resource
Thanks Bob.

The data that I attached was for a lot size of 10000, it is usually run in 3 hours. To achieve randomness, using C=0 Sampling plans we see the number of samples to be inspected for the given lot size, in this case it is 50. I generate 50 random numbers from 1 and 10000 using Excel, so the operator collects data only at those random numbers. So the first sample of my data was collected after a while and I don't know why it measured low. When the machine is set-up we measure the first 5 consecutive parts and if they are all in, we start the run.

I do not recommend sampling plans for production inspection. Really, they are for collecting data from product that has already been aggregated - such as incoming receiving. You need to make the sample random in order to make sure there is no time effect (such as samples all out of one box on a skid.)

For production measurement, it makes sense to take time function samples, such as one part every 5 minutes or three parts every 10 minutes. You want to see the process variation as a function of time. That allows you to track rate of variation - if there is a rate. Frankly, randomness will just mask incredibly valuable information.

As far as the low data, it would be good to know if there is any expectation of variation from machine warmup (or similar reason) - which a capability study would readily show you. Important information, as it is variation that will show up any time the process stops - breaks, repairs, PM, etc. - and is an expected special cause.


So should I still exclude that sample? There was no human intervention as well.

For customer capability reporting, yes - it is data that they may expect to see in their incoming product.

But when I entered the data in Minitab if you see the attachment that I sent you. The Normal Distribution Plot says that my p-value is .022 and you got a p-value of .0819. I don't understand why there is so much difference. Since the p-value was less than .05 I thought it is not a normal distribution. How did you get a different p-value from mine?

The software that we use also says that my distribution isn't normal. No one could be more happier if the data is really normal.

You really need to have a basic expectation of your process as to whether is should be normal or not. Natural variation (grass growing, bread loaf height out of an oven) i generally normal. If your process is similar to that kind of variation, then you should expect to see a normal distribution. Processes with a series of non-normal distributions affecting it may also have an overall normal total variation. But there are processes, such as precision machining, where the data may say it is normal, but it is a combination of overcontrol an measurement error. You can not rely on the data alone to tell you whether a process is normal or not. Everybody hopes for that magic minitab bullet, but it is more of a crutch. Needs a little pondering.

As far as the variation in the p value, that may simply be because I just dumped the data in as individual values, not as pairs. (I was in a hurry to get a quick view.) I wanted a down and dirty evaluation.

I did not look at the sample variation (range) to see how that related to the process variation. Sometimes it indicates that the measurement device or technique is providing a lot of the variation. You should have some idea what contribution to the total variation you are getting from your process data is from measurement contribution.
 
A

aproddutoor

The customer doesn't care how we collect our data but they expect us to follow the C=0 Sampling Plans and determine the number of samples that needs to be inspected for a given lot size. So we have to follow that for sure.

I'm trying to implement SPC in our company from what I studied in Grad School but I realized that not everything can be done according to the book.
The forums give me a chance to interact with the experts in the field.
 

Bev D

Heretical Statistician
Leader
Super Moderator
This is a classic misunderstanding
  1. SPC is NOT Acceptance Sampling; Random Acceptance Sampling samples should NOT be used as SPC subgroups
    • Acceptance sampling is done randomly to determine if the lot does not exceed some defect rate
    • SPC subgroups should NOT BE RANDOM: they should be pulled at specific preset time intervals
    • SPC is performed to determine and maintain the stability of the process and as a first step in improving the capability of the process
  2. The underlying process does NOT have to be Normal for SPC to work:
    • The process stream must be homogenous - or near homogenous - for the traditional charts to 'work'
    • If the process stream is not homogenous (you can see specific trends, shifting etc that have an obvious source) you must adjust your chart type or charting strategy to accomodate the natural systemic causes if their presence poses no detriment to performance. examples are lot to lot average shifts, machine to machine average shifts, fast tool wear (a systematic trend up or down), slow tool wear (autocorrelation from piece to piece in the short run), cavity to cavity differences, etc.
  3. Most processes are just NOT IN CONTROL when we first try to plot them on an SPC Chart. As Bob mentioned, go to a set time based sampling system and understand what is happening to generate the variation

by the way most software doesn't attempt to determine the distribtuion before calculating capability indices. It just assumes the process is normal. Some software is well thought out and programmed to do this and if it is it typically will list the distribution assumption. If it isn't, you have to intervene in some fashion.
 
A

aproddutoor

This is a classic misunderstanding
SPC is NOT Acceptance Sampling; Random Acceptance Sampling samples should NOT be used as SPC subgroups
  • Acceptance sampling is done randomly to determine if the lot does not exceed some defect rate
  • SPC subgroups should NOT BE RANDOM: they should be pulled at specific preset time intervals
  • SPC is performed to determine and maintain the stability of the process and as a first step in improving the capability of the process
I agree that Acceptance Sampling should not be used but this has been going on for years. I joined a couple of months ago. If I don't use acceptance sampling, how many samples should I collect for each process? What does it depend upon? Lot size?

We're trying to get an upgrade for the software as well. I was looking at InfinityQS, they had something similar to what you are saying. Collect a sample after so and so minutes.



Some software is well thought out and programmed to do this and if it is it typically will list the distribution assumption. If it isn't, you have to intervene in some fashion.

Can you tell me which software will list the distribution?
Which is the best software in the market right now?

I had a web meeting yesterday with InfinityQS. I liked it but it doesn't list the distribution as you said. They have Johnson transformation to transform the data if it is not normal.
 

bobdoering

Stop X-bar/R Madness!!
Trusted Information Resource
The customer doesn't care how we collect our data but they expect us to follow the C=0 Sampling Plans and determine the number of samples that needs to be inspected for a given lot size. So we have to follow that for sure.

I'm trying to implement SPC in our company from what I studied in Grad School but I realized that not everything can be done according to the book.
The forums give me a chance to interact with the experts in the field.

Bev's points are well taken.

As far as your customer requirements, it reminds me of medical requirements, where they expect a final inspection at a specified sampling rate. Much different than process sampling - especially with any hope of contributing to control.

The more progressive medical customers actually realized there is thing SPC thing, and offer it as an alternative to acceptance sampling, if capability is assured.

I'm trying to implement SPC in our company from what I studied in Grad School but I realized that not everything can be done according to the book.
The forums give me a chance to interact with the experts in the field.


Fantastic! You are on the right track already if you are:

A)Questioning what you are reading (especially in journal articles)

and

B) Here to get "the rest of the story"!

Congratulations!
 
A

aproddutoor

Bev's points are well taken.

As far as your customer requirements, it reminds me of medical requirements, where they expect a final inspection at a specified sampling rate. Much different than process sampling - especially with any hope of contributing to control.

The more progressive medical customers actually realized there is thing SPC thing, and offer it as an alternative to acceptance sampling, if capability is assured.




Fantastic! You are on the right track already if you are:

A)Questioning what you are reading (especially in journal articles)

and

B) Here to get "the rest of the story"!

Congratulations!

We are in the medical device industry and thats how it works here. I first questioned myself when I joined, why are we collecting the samples according to the C=0 sampling plans which are used for inspection. It seems like that the customers who ask us to follow don't even follow C=0 within their company.

I don't see any sub-group variation in the data that I posted. So can I not collected data in sub-groups anymore? The customer requirement is the Ppk and I was conducting a study to determine whether there was any sub-group variation.
 

bobdoering

Stop X-bar/R Madness!!
Trusted Information Resource
If I don't use acceptance sampling, how many samples should I collect for each process? What does it depend upon? Lot size?

This depends on the rate of variation, and the risk of special causes. For example, I had a grinding process that would require adjustment once per week, when controlled correctly. I divide that by 7 (7 points makes a run!) to get the frequency. Noting that it would be about once a day, I upped it to every 4 hours - in case there was a special cause. You don't want to sort a zillion parts because you did not accommodate special causes as a part of your sampling scheme!! Lot size is irrelevant for ongoing processes. For short run, you might want to divide the number of parts by 10 as a sampling rate.

How many parts per sample? It depends. For precision machining 1 part. For others 3 or 5. For normal distributions, the sample should represent the major part of the entire variation of the process. So, if your process is truly normal, your range should be pretty much the same from sample to sample (with intrinsic variation from measurement error.) If your range is much smaller than you grand average range, than either your process is not truly normal, or the sample is too small. Remember - a normal distribution process is one that can be set at one target value, and it will stay there (with some variation about that target) without any operator intervention!



We're trying to get an upgrade for the software as well. I was looking at InfinityQS, they had something similar to what you are saying. Collect a sample after so and so minutes.

The software I use for distribution analysis is...well...Distribution Analyzer by Taylor Enterprises. Its capabilities are listed at the website, and you can even try it out for free. Best thing is it is fabulously easy to use!!!

( I do not own Taylor Enterprises...in case one wondered.)

Find out if the SPC software you want to use really works for your process. You might be wasting your money. As Bev states, you data does not have to be normal for SPC to work - but it does affect which specific charts work or not. For example, X-bar-R charts are abjectly useless for most precision machining control. But, there is a technique that works - very well. Most software on the market today is incorrect for precision machining - and paper charts are about all one can do correctly. (They claim it is too special of a process to do the correct SPC for precision machining - yeah, right...) So...buyer beware.
 
Top Bottom