Whats the rationale for a Sample size of 30 for Capability Analysis?

A

Abhijeet

Whats the rationale behind the sample size requirement of 30 for capability analysis?It is said that 30 takes the sample closer to the normal distribution.But if the population is non normal distribution and we draw statistical inference by "making" it normal then how the conclusions are applicable to population which has non normal distribution?
 

Tim Folkerts

Trusted Information Resource
Re: Sample size of 30 for capability analysis

It's not that 30 samples will make the sample normal despite the original distribution being far from normal. It's that if you draw 30 pieces and the original distribution is indeed normal, than the sample should look at least sort of normal.

As with most things in statistics, the number 30 is just a rule of thumb. It is a big enough sample to be reasonably sure the results are in the right ballpark, but small enough to be practical.

Roughly speaking, with a sample size of 30, the 95% confidence interval for the mean is 1/3 of the standard deviation. So if your sample produced a mean of 10 and a st dev of 3, then you can be 95% sure the "true" mean is between 9 and 11.


Tim F
 
D

Darius

Re: Sample size of 30 for capability analysis

I believe that has something to do with the central limit theorem

http://www.qualityamerica.com/knowledgecente/articles/cqeIVH1a.html

Irrespective of the shape of the distribution of the population or universe, the distribution of average values of samples drawn from that universe will tend toward a normal distribution as the sample size grows without bound.

But be ware :caution: , it's just for averages not individual points.:notme:

IMHO, with 30 can be said the the statistics are normaly stable (you can almost trust on the values), as the sample increases the measures of position (mean, median, etc.) and variation can fluctuate around the central value, but as the sample grows the fluctuation of the resulting estimate get smaller. Agree that not always, I trust more on Grubbs outliers removal to obtain a good estimate (in some cases I found that outliers make the average go to far away from the value obtained after 500,000 samples that was the same as the value with 1000 samples and outliers removal).

Be carefull :caution: with the indicators, most of them use mean or standard deviation, do you know that "mean" take in account that the distribution is normal (the same with standard deviation). But by other way, "Don't worry, be happy, don't take it too personal", they are just estimates, but remember the more samples the more stable the indicator.:cfingers:
 
A

Abhijeet

Re: Sample size of 30 for capability analysis - Whats the rationale?

thanks Tim for the prompt reply
What i am not able to understand is in the capability analysis the emphasis is given on the data being normally distributed and you are asked to use techniques like Box-Cox transformations to convert non normal distribution to normal distribution.Why is that? How can we truely say the capability is indeed what we get when we transform the non normal data using Box Cox transformation
 

Tim Folkerts

Trusted Information Resource
Re: Sample size of 30 for capability analysis - Whats the rationale?

At some level, capability can be calculated independently of the distribution. You can always calculate the mean and standard deviation even if you have no idea what the distribution is. From there, you can calculate the various capability indices by simply following the equations. For example, Cpk is just how many standard deviations you are from the closer spec limit divided by 3.

The challenge comes when you want to interpret the results and make predictions. Because the normal distribution is both common in real life and well-studied mathematically, it is a handy starting point. If your data follows the normal distribution, then there are lots of techniques for predicting things like ppm from the statistics.

If the data doesn't follow the normal distribution, then you basically have two choices
  • develeop the mathematical tools to make predictions based on the actual distribution, or
  • change the data to make it behave like a normal distribution.
Box-Cox transformations are an attempt to do the second appoach. By making the data reasonaly close to normal, then calculations based on the normal distribution will be reasonably close to correct.


Tim F
 

bobdoering

Stop X-bar/R Madness!!
Trusted Information Resource
Re: Sample size of 30 for capability analysis - Whats the rationale?

For precision machining, and its associated non-normal uniform distribution (which has no dependence on the mean whatsoever and the standard deviation has little to offer), you can not predict a minimum number. It depends on the tool wear rate, and how many parts is takes to generate at least one cycle of the sawtooth curve (unless you are willing to extrapolate based on the tool wear rate determined from a sample) For example, if it takes a week and 3000 parts for the tool to wear from the lower control limit to the upper control limit, then 3000 is the minimum - but if it takes 5 parts to wear at that rate, 25 parts will give you 5 cycles - or more than enough data. A little simplistic, but should illustrate the point effectively. It is critical to collect the data in time order to evaluate the tool wear rate. It is not a random sample function, as the tool wear rate is a dependent (not independent) function of time. As a dependent function, CLT does not apply, either.

For more information, see: Statistical process control for precision machining
 
Top Bottom