# Shewhart Constants vs Central Limit Theorem in calculating Control Limits

I

#### Inspector-71

I am confused by the use of shewart constants instead of +/- 3 standard deviations in calculating control limits

Say for an xbar-r chart I have my data set and will need to calculate the control limits via the relevant D3 and D4 constants. I believe I have to use these instead of +/- 3 standard deviations as the constants allow for the effects of sample sizes and different distribution types (if anyone could expand on that, it would be appreciated). +/-3 standard devitions for control limits could only be used if the distribution was normal?

However, in CLT if I am sampling any consistent distribution shape with say a sample of 5 parts per hour, then the mean of these samples when plotted should produce a normal distribution once there are a decent amount of means to plot. Since the distribution is normal, the empirical rule can be applied and 99.7% of data should be within +/- 3 standard deviations.

so why do xbar-r charts not use +/-3 standard deviations?

To add further confusion, when calculating Pp/Ppk we do use +/- 3 standard deviations when and compare this to the specification limits. Why use +/- standard deviations here and not in the control charts?

Thanks for any help.

#### reynald

##### Quite Involved in Discussions
Re: shewart constants vs central limit theorem

so why do xbar-r charts not use +/-3 standard deviations?
---Actually it does use +/-3 standard deviations, but the standard deviations are estimated from the range-bar. To gest the estimated value of this standard deviation the range is multipled by a certain constant which depends on the sample size. The factor +/-3 is already incorporated in that constant multiplier.

To add further confusion, when calculating Pp/Ppk we do use +/- 3 standard deviations when and compare this to the specification limits. Why use +/- standard deviations here and not in the control charts?
--Ppk assumes that you use the overall data and not in its subgrouped form. You use the same contants as in the control charts when computing for the Cp/Cpk.

I

#### Inspector-71

Re: shewart constants vs central limit theorem

Thankyou. I had talked myself into that direction but it's great to have it confirmed.

#### Steve Prevette

##### Deming Disciple
Staff member
Super Moderator
A lot of the basis for the use of range calculations is that one cannot calculate the standard deviation of a sample very easily with an adding machine and a slide rule. Thus, the reliance (prior to computers) on the use of range to estimate standard deviation.

Today, most authors (especially Dr. Wheeler) are still proponents of the range versus simply typing in "STDEV" in Excel spreadsheet. One thing that does happen is that if your data has an outlier, the STDEV will "blow up" to a larger quantity (due to the squared difference in the formula) rather than the range. Moving range also takes into account the sequence of the data, while the standard deviation calculation does not.

I believe an argument can be made for using the standard deviation estimator (even Shewhart documents it is more statistically "powerfull" than the moving range), but it has not been accepted.

I've done a lot of comparisons, and the moving range estimate usually comes out quite close to the sigma standard deviation calculation anyways.

C

#### Curtis317

"I've done a lot of comparisons, and the moving range estimate usually comes out quite close to the sigma standard deviation calculation anyways."

The estimate only comes close to the sigma standard deviation if the data is normal. The futher it diverges from normal the bigger the difference will be.

#### Bev D

##### Heretical Statistician
Staff member
Super Moderator
I'm sorry but this is a common misperception. the standard deviation is not based on Normality. and estimates of the total standard deviation from the within subgroup variation is not based on Normality either. It is based on the homogeneity of the process stream. IF the I, MR chart is sampled in time order and the process stream is homogenous and the movign range is calculated on the time ordered data the moving range estimate will be very close to the total standard deviation calculated from all of the individual values...close enough for SPC. Remember that control charts are not precise statistical estimators...

If the process stream is NOT homogenous the within subgroup variation will NOT provide an accurate estimate of the overall standard deviation or the variation of the subgroup averages. This is exactly why Shewhart used the within subgroup variation to estimate the between subgroup variation. If it does, the process is in statistical control; if it doesn't it is out of statistical control. (t his part is a bit more complicated and involves rational subgrouping and understanding that non-homogenous processes can be stabel and predictable...)

C

#### Curtis317

I'm sorry but this is a common misperception. the standard deviation is not based on Normality. and estimates of the total standard deviation from the within subgroup variation is not based on Normality either. It is based on the homogeneity of the process stream. IF the I, MR chart is sampled in time order and the process stream is homogenous and the movign range is calculated on the time ordered data the moving range estimate will be very close to the total standard deviation calculated from all of the individual values...close enough for SPC. Remember that control charts are not precise statistical estimators...

If the process stream is NOT homogenous the within subgroup variation will NOT provide an accurate estimate of the overall standard deviation or the variation of the subgroup averages. This is exactly why Shewhart used the within subgroup variation to estimate the between subgroup variation. If it does, the process is in statistical control; if it doesn't it is out of statistical control. (t his part is a bit more complicated and involves rational subgrouping and understanding that non-homogenous processes can be stabel and predictable...)
If you calculate the Ppk and Cpk for the same set of data they can be much different. The closer they are to each other the more "Normal" the data will be. I have no issue with your comments about the control charts. The standard deviation is just a statistic generated from the data and "normality" has nothing to do with the number.

#### Steve Prevette

##### Deming Disciple
Staff member
Super Moderator
The estimate only comes close to the sigma standard deviation if the data is normal. The futher it diverges from normal the bigger the difference will be.
The standard deviation is the standard deviation! The statistical definition of the standard deviation of a set of data is sum(Xi - Xbar)^2 / N.

For a sample, N-1 is used in the denominator in order to give an unbiased estimator of the population.

Shewhart even states in Economic Control of Quality of Manufactured Product (page 289) "It appears, therefore, that there is good reason to choose the standard deviations sigma of the sample as the basis for estimate of the standard deviation sigma of the universe to detect a change delta sigma.

#### Bev D

##### Heretical Statistician
Staff member
Super Moderator
If you calculate the Ppk and Cpk for the same set of data they can be much different. The closer they are to each other the more "Normal" the data will be. I have no issue with your comments about the control charts. The standard deviation is just a statistic generated from the data and "normality" has nothing to do with the number.
actually again Normality still has nothing to do with the situation you describe.
the difference between Cpk and Ppk (in the traditional formulas) is where the SD comes from and the centering of the process within the spec limits. teh difference between Cpk and Ppk is the exactly like control charts. Cpk uses within subgroup variation - IF the process is homogenous the within subgroup variation will provide an accurate calculation of the total variation because the between sample variation is just sample error; in oether words teh process steam is homogenous.

The accuracy of the Cpk or Ppk index to the actual spread of real vaules vs the sepc limits and the resutling defect rate IS dependent on the normality of the process. but within any given process the closeness of the Cpk and Ppk index to each other is due to the homogeneity of the process and the centering.

#### Steve Prevette

##### Deming Disciple
Staff member
Super Moderator
I would just suggest that Cpk and Ppk are off-topic for the original question.

Personally, I am no fan of either number. If I really want a good estimate for percent defective, then I would say I need to know the distribution of the source data.