The Elsmar Cove Wiki More Free Files The Elsmar Cove Forums Discussion Thread Index Post Attachments Listing Failure Modes Services and Solutions to Problems Elsmar cove Forums Main Page Elsmar Cove Home Page
Google
  Web Elsmar.com
*Please be aware that SOME RECENT forum threads may not yet be indexed by Google.

View Full Version : Control Limits vs. Confidence Intervals - What range can we be 95% confident?


sammy.storm
25th April 2005, 07:10 PM
I have recently started a new job, and am struggling with what is being asked.

I have been asked to calculate Confidence Intervals for a distribution of data, to evaluate the risk of going out of spec. My understanding of Confidence Intervals is to calculate a range in which the "true" mean would be in reference to the mean calculated from the sample. (ie. mean = 2.00, and the "true" mean would be 2.00 +/- 0.04 for a 95% reliability).

I calculated the CL above with Excel, and manually, but I'm not clear how this would help determine if the risk of "out of spec". My Std Dev is 0.25, so my +/- 3 sig is from 1.25 to 2.75.

Isn't the +/- 3 sigma of the Normal distribution curve a type of Confidence Interval? ie 99.7% will be within this range.

I guess the question they want answered is: What range can we be 95% confident that 95% will be in spec?

Can the 95% confidence interval calculated in Excel be added to each end of the +/- 2 sigma?

Sorry if this is a dumb question, but I appreciate any input.

Tim Folkerts
25th April 2005, 10:17 PM
Sammy,

Welcome to the Cove!

I'll take a couple stabs at some advice and then we can see if we are on the same wavelength.

First, analysis usually assumes a normal distribution. If that's not (at least approximately) the case, then the following will not be (at least apprioximately) correct. :caution:

It looks to me like you are starting on the right track. If you measure 150 pieces and get x-bar = 2.00 and s = 0.25, then the 95% confidence interval for the mean is 2.00 +/- 1.96*s/n^0.5 = 2.00 +/- 0.04.

OTOH, you would expect 95% of the pieces to fall in the range 2.00 +/- 1.96*s, which is about 1.5 - 2.5. 99.7% would be in the range 1.25 - 2.75

It sounds like you want the odd of being in a specific range (to meet some imposed specs). To find the probability of being below the lower spec limit, find
z = (LSL- x-bar)/s
and look on a standard table. Or use the Excel NORMSDIST function. Similarly, to find the probability of being too high, find
z = (USL- x-bar)/s
and take 1 - (value from the table). Add these two to get the probability of being out of spec.

Now, these calculations assume that the values for the mean & stdev are correct. As you point out, there is some uncertainty in the mean. Perhaps more importantly, there is an uncertainty in the standard deviation. For the example about, the 95% confidence interval for the mean would be 0.06, so you are 95% certain that the true sigma is 0.19 - 0.31. Clearly, if the sigma is bigger than you estimated (0.25), then the fraction outside the specs would be much bigger.



Does something in there help? Or did I go off on a tangent and miss what you were looking for?

Tim F

sammy.storm
26th April 2005, 09:02 AM
Tim,

Thanks for the response. That is how I understand the use of those calcs, but I was asked what the confidence is In the Control limits.

What I was wondering is if I am supposed to, or able to add the the confidence interval to the outside of the coontrol limits to say "I am 95% confident that 99.7% of the population will fall within this range.

I'm not sure this is a valid use of the confidence interval.

Thanks much,

Sammy
:thanx:

Dave Strouse
26th April 2005, 09:27 AM
sammy.storm -

Sounds like you are being asked to find a Tolerance interval i.e. an interval where with gamma probability (confidence), some P (proportion) of the distribution will be found.

The venerable Juran handbook 5th ed., pp 44.47 to 44.54 will give you more info than you probably care to know, including four methods of calculation and tables for the factors needed in the appendix.

BTW, the one I have used most is a distribution free method (Type IV). Here you only need to assume statistical independence and random sampling (always needed for any technique!). Then take the extremes of the sample for your interval. For example, from table W (reference above), if you take 1000 samples the high and low values are a range with 95% confidence to contain 99.7%. If you want a 95& interval on 97% of the population , omly about 150 samples are required.

The other methods for this will require fewer samples, but are heavily dependent on the normality assumption.

Dave

Tim Folkerts
26th April 2005, 02:57 PM
Besides the suggestion by Dave to look at Juran's book, you might try the online stats handbook from NIST (US National Institute of Standards & Technology). A link to the page on tolerance intervals:

http://www.itl.nist.gov/div898/handbook/prc/section2/prc263.htm

The handbook is aimed at industrial stats, and has good sections on all sorts of useful stats, from the very basics up through advanced topics like DOE.


Tim F

sammy.storm
28th April 2005, 09:56 AM
Tim/ Dave,

Thanks so much. I think all this will help justify using the +/-3 sigma and CpK as I always have.

I also really like the link listed above.

Sammy

Steve Prevette
28th April 2005, 08:10 PM
Please be aware that the control limits on a control chart are not probability limits, nor are based upon the normal distribution. There is much discussion on this available in Dr. Wheeler's texts, and also in Dr. Deming's books.

One thing to recognize is that a number of rules are generally used simultaneously, such a 7 in a row the same side of the average along with simple points outside the control limits. Also, the Thebychev Inequality bounds the probability of being outside three standard deviations from the average at less than 11% (not 3 in 1000). Dr. Shewhart did a great deal of empirical trials and came up with the best compromise between false alarms and failure to detect at 3 standard deviations. Nothing to do with the confidence interval of a single estimate.

AJRowe
20th April 2009, 01:09 PM
This is an old post so I'm not sure if my question will get picked up or not. I'm interested in the statement "OTOH, you would expect 95% of the pieces to fall in the range 2.00 +/- 1.96*s, which is about 1.5 - 2.5. 99.7% would be in the range 1.25 - 2.75"

I'm trying to understand the statistical basis for claiming one would expect 95% of samples to fall within the range of "Mean +/- t-value * StdDev"

Thanks

Steve Prevette
20th April 2009, 01:12 PM
This is an old post so I'm not sure if my question will get picked up or not. I'm interested in the statement "OTOH, you would expect 95% of the pieces to fall in the range 2.00 +/- 1.96*s, which is about 1.5 - 2.5. 99.7% would be in the range 1.25 - 2.75"

I'm trying to understand the statistical basis for claiming one would expect 95% of samples to fall within the range of "Mean +/- t-value * StdDev"

Thanks

The claim is only valid if the ASSUMPTION that the data fits the normal distribution is true. If is is true, you look up z values for large samples, and use t distribution values for small samples (<25). Many people will invoke the "Central Limit Theorm" (or the Law of Large Numbers) to invoke normality, but this can cause error.

AJRowe
20th April 2009, 01:31 PM
Thanks for the quick response. I guess I don't understand the full usefulness of the t-table. What is it about multiplying one standard deviation by the t-value that takes the distribution estimate from 68% to 95%? In my limited understanding of statistics I thought the t-table just gave us a plus/minus range around a sample size's mean for small sample sizes. I don't understand how to make that leap to using it to estimate the range of 95% of the sample.

Steve Prevette
20th April 2009, 01:32 PM
Best thing to do is find a good stats book and read up on the "Student T Distribution". Short of that, consult Google. There is a reasonable Wikipedia entry at

http://en.wikipedia.org/wiki/Student's_t-distribution

Erik Alburg
20th April 2009, 06:13 PM
The 95% number comes from the characteristics of the normal distribution which describes what percentage of the distribution lies between +/-1,2, and 3 standard deviations. Sometimes this is stated as the 68-95-99 rule where:



68.27% of the distribution falls between +/- 1 standard deviation
95.45% of the distribution falls between +/- 2 standard deviation
99.73% of the distribution falls between +/- 3 standard deviation

So looking at the mean +/- 1.96 * standard deviation means that ~95% of the distribution will fall between those limits.

That is IF the data is normal but the problem with any control chart is that they are designed to be effective regardless of the distribution and rely on the central limit theorem.

So to be a valid conclusion you must test for normality before you make this statement based on a SPC chart.

AJRowe
20th April 2009, 06:41 PM
So is it just the fact that 1.96 is close to 2 and 2 standard deviations is 95% of a normal distribution that makes the original statement true? Or is it really the case that the t-value multiplied by the standard deviation gives you a some kind of confidence interval on the distribution?

Let me ask it another way. In the original example, if instead of 150 pieces there were only 4 pieces. The t-value would be 3.182. If the x-bar = 2.00 and s = 0.25 stayed the same (just for example) would you still say we would expect 95% of the pieces to fall in the range 2.00 +/- 3.18*s, which is about 1.2 - 2.8? Or since the t-value is up to 3 we would say 99% of the pieces to fall within that range?

I understand the role of the t-value in calculating the confidence interval for the mean, my question is what does it have to do with calculating a confidence in the range?

Thanks.

Steve Prevette
20th April 2009, 07:13 PM
So is it just the fact that 1.96 is close to 2 and 2 standard deviations is 95% of a normal distribution that makes the original statement true?

SSP: Yes, assuming the variance is known, or you have a very large sample.

Or is it really the case that the t-value multiplied by the standard deviation gives you a some kind of confidence interval on the distribution?

SSP: Yes, when it is a small sample.

Let me ask it another way. In the original example, if instead of 150 pieces there were only 4 pieces. The t-value would be 3.182. If the x-bar = 2.00 and s = 0.25 stayed the same (just for example) would you still say we would expect 95% of the pieces to fall in the range 2.00 +/- 3.18*s, which is about 1.2 - 2.8?

SSP: Yes.

Or since the t-value is up to 3 we would say 99% of the pieces to fall within that range?

SSP: NO, definitely not.

I understand the role of the t-value in calculating the confidence interval for the mean, my question is what does it have to do with calculating a confidence in the range?

Thanks.[/QUOTE]

What is happening is that when you have a small sample you are estimating BOTH the average and standard deviation from the sample. This builds up a larger potential error than if you "knew" the standard deviation, or had a very large sample. Thus, the t-distribution is a bit flatter and wider than the normal distribution.

The standard deviation of an individual is sigma
The standard deviation of an average of n individuals is sigma / sqrt (n)

This is how you use the t distribution for both the distribution of the average, and for the individuals.

I do, however, strongly recommend you do some reading of source material on your own.

AJRowe
20th April 2009, 07:51 PM
All the source documentation I've read, including the wikipedia link, speak in terms of calculating a "confidence limit of the mean" with a t-value. I can't find any literature that supports a claim that 95% of a sample's population will be found within a range defined by the mean plus or minus the t-value times the standard deviation. (All assumptions on normality held true.)

I'll keep looking.

Thanks.

Steve Prevette
20th April 2009, 08:21 PM
I must admit you got me curious, so I have been poking around, and yes, all references to the t distribution are related to distribution of the average of a sample. I did find one reference so far at http://davidmlane.com/hyperstat/A48339.html that does relate the t to individuals.

Erik Alburg
21st April 2009, 11:03 AM
I think there is a lot of confusion about the concept of probabilities and that of confidence (statistical).

If I handed a bag full of marbles and I tell you that there is 9 black marbles and 1 red marble. What would be the chances of pulling out a red marble on the first try? 1 out of 10 or 10%. The 10% represents the probability of a success.

When looking at any probability density distribution such as the Normal distribution or t distribution, the area under the curve represents 100% probability. So when we say that 95.45% of the parts should fall between +/- 2 standard deviations of a normal distribution then we are talking about a probability.

Statistical confidence on the other hand has to deal with how sure are we that a certain parameter of a group of data is what we believe that it is. Typically confidence levels are calculated for parameters such as means, medians, modes, variance, etc and not typically on an entire group of data.

For example, let's say I handed you 100 bags of marbles of 10 and I told you that based upon my past experience I believe that there is one red marble (defect) on average in a bag. So this time you look at all the marbles in 100 of the bags (synonymous with subgroups in SPC) and now you find in 95 of the 100 bags that some bags that the there are 0-3 red marbles.

In this case you would be 95% confident that the average defect rate per bag was between 0 and 3 marbles. (this is conceptually of course) Since my assumption was that there was one defect per bag and that this point estimate falls in within the confident bands then there is no evidence to contradict my claim.

Tim Folkerts
21st April 2009, 12:13 PM
For example, let's say I handed you 100 bags of marbles of 10 and I told you that based upon my past experience I believe that there is one red marble (defect) on average in a bag. So this time you look at all the marbles in 100 of the bags (synonymous with subgroups in SPC) and now you find in 95 of the 100 bags that some bags that the there are 0-3 red marbles.

In this case you would be 95% confident that the average defect rate per bag was between 0 and 3 marbles. (this is conceptually of course) Since my assumption was that there was one defect per bag and that this point estimate falls in within the confident bands then there is no evidence to contradict my claim.

I don't quite agree with this interpretation. First I don't like the idea of only looking at 95 of the 100 results. Anytime you throw out data, you run into potential problems.

In your example, I think you would be 100% confident that 95% of the bags had 0-3 red marbles. But the other 5 bag might have all red marbles, and this could throw the average up over 3. Without knowing the distribution of marbles within the bags you could not really make a prediction about the confidence.

I will agree that finding up to 3 marbles in any give bag is not enough to contradict "10% average defects", but when you have more bags, your estimate of the average gets much better and you may well be able to claim that the average is not 10%.

Even if you found all 100 bags had 0-3 marbles, there could easily be evidence to contradict the claim of an average of 1 defect. With 10% defective on average, for 1,000 marbles tested total, you would be 95% confident of getting between about 0.8 and 1.2 marbles average per bag (if my math is right). So any average outside of 0.8-1.2 marbles in the 100 bags would be significant evidence of a change.



Tim F

Erik Alburg
21st April 2009, 12:34 PM
Yes you are right... I tried to edit the post and it isn't going through ... I had a brain fart :notme:

You would need to look at all 100 and if the results of 95 of the samples were between 0-3 then you would have 95% confidence.

Moral of the story... read carefully what you write before you hit the summit button.:bonk:

Erik Alburg
21st April 2009, 12:43 PM
Even if you found all 100 bags had 0-3 marbles, there could easily be evidence to contradict the claim of an average of 1 defect. With 10% defective on average, for 1,000 marbles tested total, you would be 95% confident of getting between about 0.8 and 1.2 marbles average per bag (if my math is right). So any average outside of 0.8-1.2 marbles in the 100 bags would be significant evidence of a change.


The example was a conceptual one... but I really have never had a case where I have reached into a bag of marbles and pulled out 1.2 of them... have you? :lol:

Tim Folkerts
21st April 2009, 02:01 PM
The example was a conceptual one... but I really have never had a case where I have reached into a bag of marbles and pulled out 1.2 of them... have you? :lol:

One of the marbles could be broken! :D

AJRowe
21st April 2009, 03:36 PM
All interesting.

To bring the focus back to the open question: What is the basis for the statement that one can expect 95% of a population to fall between the range of Mean +/- T-Value * Std Dev? (all assumptions on normality held true)

Statistical Steven
22nd April 2009, 11:45 AM
This is an old post so I'm not sure if my question will get picked up or not. I'm interested in the statement "OTOH, you would expect 95% of the pieces to fall in the range 2.00 +/- 1.96*s, which is about 1.5 - 2.5. 99.7% would be in the range 1.25 - 2.75"

I'm trying to understand the statistical basis for claiming one would expect 95% of samples to fall within the range of "Mean +/- t-value * StdDev"

Thanks

AJ -

To sum up the previous comments, a confidence interval only tells you where the mean is expected. Using the 1.96 assumes you have a known standard deviation, usually not known, so I would suggest using the t-distribution. If you want to know how the individuals are distributed, you need to use tolerance intervals. These have both a confidence level and proportion of the individuals that are within the limits. The width of the confidence interval or tolerance interval are a function of the sample size used to calculate the mean and standard deviation of the sample. Here are some sample tolerance intervals for 95% confidence/95% proportion

n=3 n=6 n=10 n=20
-0.50 1.08 1.28 1.31
4.50 2.92 2.72 2.69

Hope that helps.

Bev D
22nd April 2009, 01:55 PM
Yes, if you want to know what proportion of a population lies within some +/- range you would use tolerance intervals not confidence intervals. Typical, easy to find formulas for tolerance intervals assume a Normal distribution which is not universal. so use caution and do a test for normality.



Using the 1.96 assumes you have a known standard deviation, usually not known, so I would suggest using the t-distribution.

well, the 1.96 can be used for 'larger sample sizes' approximately greater than or equal to 30. It comes of course from the Z tables for a Normal distribution. As our friend from Guiness 'discovered' small sample averages deviate from the Normal enough to warrant their own distribution: the Student's t. I have found that this rule of thumb in switching from the t distribution to the Z distribution to be acceptable for most applications.

Isn't the working (applied) assumption that we can estimate the population standard deviation from the sample standard deviation? The assumption being that it will be 'close enough' and that using the t distribution for small samples (which will have more inaccuracy in the estiamte of the standard deviation than larger samples) will in effect compensate for small sample standard deviation error?

PatrickB
5th May 2009, 05:12 PM
When is it appropriate to use Prediction Intervals instead of Tolerance Intervals? I don't understand why they are different if both used for predictions.

Statistical Steven
5th May 2009, 07:15 PM
When is it appropriate to use Prediction Intervals instead of Tolerance Intervals? I don't understand why they are different if both used for predictions.

It is a subtle difference between the two. A prediction interval gives a (1-alpha)% confidence that the "next" value will be in the interval. A tolerance interval gives the (1-alpha)% confidence that a certain percentage of the individuals will be contained in the interval.

A prediction interval gives the probability for a future observation from a data set with a given mean and standard deviation. A tolerance interval says that given a mean and standard deviation for the sample size used to estimate the mean and standard deviation a certain percent of the individuals will be in the interval with certain confidence.

I hope this makes sense. If not, please send me a PM.