Calculation of the centerline for S-Charts

G

glarson

I've been searching posts and have not found an answer to my question, so I've decided to ask for advice.

I have been reviewing some SPC charts this past week, and I've come across something that's been bugging me and leading me to ask "Was I taught wrong?" Specifically, I have noticed that there were at least three different ways (calculations) of determining the centerline (S-bar) of the charts. Now maybe they are all correct, and maybe none are.

The first calculation of the centerline is simply the average of all the grouped Std. deviations.

The second calculation of the centerline is the pooled std. dev for when one assumes the population std. dev are unknown but equal.

The third calculation of the centerline is the pooled std. dev for when one assumes the population std. dev are unknown and not equal.

I have uploaded the three equations.

Here is some background:
Several of the s-charts that I have been reviewing involve subgroups of differing sample sizes (2 to 10 sample per group). Some involve subgroup sizes of 2.

Most of the calculations for the centerline use the first calc of averaging the grouped std. deviations. I've been looking online and found that most sample s-charts use this first calculation too.

When I was a grad student, I was shown two different ways to calculate the centerline of S-Charts. I was taught to use the first calculation with the source of "Porbability and Statistics for Engineers and Scientists" 6th ed. by Walpole, Myers, Myers, 1998, p. 651. The use of the second calc. of the pooled std. dev. for equal population std. dev. is used to calculate the centerline in the "Introduction To Statistical Quality Control" 4th ed. by Douglas C. Montgomery, 2001 p. 245 (for variable sample sizes) My graduate education is about 9 years old now.

I recognize that all three calculations are expected to be only estimates of the population std. deviation, but I've been working with the different calculations with a practice set of data, and I am getting larger than expected differences between the three calcs. 1st calc = 0.34; 2nd calc = 0.44, 3rd = 0.54.

When should each of these calculations be used to calculate the centerline of the S-Chart, if any? This question becomes more critical as S-Bar (centerline) is used to calculate the LCL and UCL for the charts I'm reviewing.

I welcome your comments and critisisms.

Thanks,

GL
 

Attachments

  • std deviations.pdf
    33.1 KB · Views: 247
Last edited by a moderator:

Bev D

Heretical Statistician
Leader
Super Moderator
The first formula you list is the formula for estimating the average SD when you have a constant sample size.

The second formula is used when you have a varying sample size.

The third formula should never be used. The reason is that if you have different variances you have an unstable process by definition.

As an aside, the whole idea of a known population Sd is a theoretical nicety, but in practice it is never known and the original formulas for using the within the supbgroup SD and average are the formuals to use as they were designed with the real world in mind.
 

Bev D

Heretical Statistician
Leader
Super Moderator
OOPS, I forgot. When you have unequal sample sizes you should also have variable limits based on the sample size of each subgroup. The formulas for this are in the attached PDF document.
 

Statistical Steven

Statistician
Leader
Super Moderator
A better question is why do we take the straight average of standard deviations (or weighted if unequal sample size)? We always teach people to take the square root of the average variance. Why not here?

(See Bev's answer previously for the hint).
 
G

glarson

You're funny. At a constant ni the pooled std dev reduces to the ave of the grouped std deviations.
 
G

glarson

Bev, thank you.

You have given me substantiation that I have been doing it properly. Unfortunately, several of the charts that I've been reviewing were not calculated properly, specifically those that involve variable subgroup sizes. With the help of your post (it's funny how it's easier for people to believe a stranger than a coworker) I hope correcting matters will go more smoothly than it has. Yes, I will be showing your responses to coworkers. Hope you don't mind.

When I came across the use of the third calculation for unequal variances in a chart, I had a hard time containing my bewilderment. I was told that this calc was used because each subgroup has its own std. dev. I explained as you did, that if each subgroup has a different population std dev. then the process is not in a state of control. Though they agreed with my statement, they did not think the formula was wrong. Since I'm not a statistician, of course I don't know more than an engineer...<sigh> Sorry for venting.

Again, Thank you.
 
G

glarson

I did not read your teaser properly. I did not read what you had in () and got in my head you were asking what I answered.

S^2 (the sample variance) is an unbiased estimator for sigma^2 (the population variance), but s (sample std dev) is not an unbiased estimated of sigma (pop std dev.) sbar/c4 though is an unbiased estimate of sigma.

Am I on the right track?
 

Statistical Steven

Statistician
Leader
Super Moderator
I did not read your teaser properly. I did not read what you had in () and got in my head you were asking what I answered.

S^2 (the sample variance) is an unbiased estimator for sigma^2 (the population variance), but s (sample std dev) is not an unbiased estimated of sigma (pop std dev.) sbar/c4 though is an unbiased estimate of sigma.

Am I on the right track?

You are correct, A3 is just 3/(c4*sqrt(n)). But the calculation of sbar is not simply the average of the standard deviations. Why can we just average standard deviations to get sbar? The hint is that the process is stable and the assumption is that each Si is equal....correct?:applause:
 
Top Bottom