# Overall and Within/Between STDEV - Why not the same?

#### bobdoering

Trusted Information Resource
Thanks Bev D for the plots.

The data offers more questions than answers. It does show how jumping to the distributions without the time series is a very, very bad thing.

It appears to have a trend. OK...that poses the next question: Does it make sense that it has a trend? You need to know that before performing any comparison testing, as sampling error may make your data look different but it may be statistically the same. (See here an example how sample many 'sample distributions' can come from the exact same population/process from sampling error.) If it does, you may need to look at non-normal analysis - and any normality you see in your distributions may be measurement or gage error combined with sampling error (both typically normal and both capable of masking your process variation if large enough).

#### Miner

##### Forum Moderator
So, I understand the concept of WITHIN (subgroup) standard deviation and BETWEEN (subgroup) standard deviation. Also that BETWEEN/WITHIN is root(within^2+between^2).
OVERALL is stdev of all measures.

Question 1:
Why is Overall stdev not equal to between/within? I don't understand.

Question 2:
I have a process improvement where WITHIN, BETWEEN, and WITHIN/BETWEEN stdev all went down (by quite a bit), but OVERALL went up. How can this be? Perhaps related to question 1.
Question 1: I researched some more on this. The between subgroup variation is based on the moving range between subgroup averages. If your process were in control the Between variation is equal to zero and the B/W variation would equal the Within variation.

Question 2:Lot PP is very unusual. The mean of this subgroup is not radically different so it does not influence the Between variation, but the large range inflates the overall variation

P

#### puck1263

Thanks all.
The last lot shoudl be deleted (PP). Something either happened in the process or with the sample (still looking into it), causing the high stdev.

There shouldn't be any trending by time sequence.

Still trying to understand.

#### Bev D

##### Heretical Statistician
Super Moderator
Thanks all.
The last lot shoudl be deleted (PP). Something either happened in the process or with the sample (still looking into it), causing the high stdev.

There shouldn't be any trending by time sequence.

Still trying to understand.

Exactly what are you trying to understand? even eliminating hte final point changes very little. The Ppk value for the AFTER data set goes up a bit, but all of the same comments made previously still apply.

#### bobdoering

Trusted Information Resource
There shouldn't be any trending by time sequence.

I did not see where you identified what the process is, so we have to take your word for that. But that is a lot different than agreeing or disagreeing with that point. It would be interesting to understand the justification for that point.

#### automoto

##### Involved In Discussions
Some definition for better understanding...

Within variation - is the variation due only to the variation within subgroups. It is good indicator, but only in case that process is in statistical control.
Between variation - variation due to the variation between subgroups. If the process is in statistical control this variation should be zero.
Total process variation - it is good indicator for process which is not in statistical control. This variation include the effect of the special cause an common cause.
If the process is in statistical control the within variation will be very close to the total process variation.

#### bobdoering

Trusted Information Resource
Some definition for better understanding...

Within variation - is the variation due only to the variation within subgroups. It is good indicator, but only in case that process is in statistical control.
Between variation - variation due to the variation between subgroups. If the process is in statistical control this variation should be zero.
Total process variation - it is good indicator for process which is not in statistical control. This variation include the effect of the special cause an common cause.
If the process is in statistical control the within variation will be very close to the total process variation.

These are some very simplistic definitions assuming some theoretical perfections - including that the process variation consists of one normal factor, and it is the only factor that appears in the data. Pretty rare.

For example, a more key definitions (IMO):

Total Variation: the sum of all variations that affect the process. Measurement (and subsequent plotting) of process output will reflect total variation. It will generate a distribution that is the sum of all distributions that effect the process.

Process Variation: The remaining variation when all non-process variation (including common and special causes including measurement and gage error, material changes, etc.) is held to a statistically insignificant level. These variations must be held to a statistically insignificant level prior to determining statistical control of the process variation.

#### Bev D

##### Heretical Statistician
Super Moderator
Between variation - variation due to the variation between subgroups. If the process is in statistical control this variation should be zero.

Those definitions are correct for the generic case of a control chart and the resulting standard deviations. (Bob was discussing sources or inputs of variation as well as additional components of variation such as what one would see in a multi-vari)

Two points of clarification - and these are important to understanding real world control charts:
Even with a homogenous process stream that is in statistical control, there will be some between subgroup variation. (You can prove this to yourself using a random number generator for a perfect Normal distribution). In this case the between subgroup variation will be sampling error or the standard error of the mean. It is S_average = S_Population/sqrt. until n = infinity or the full population size S_average (=S_between) cannot be zero.

Most processes don't have homogenous streams and many subgrouping schemes are not rational.

A rational subgroup will contain the variation within the subgroup that is to be "controlled" between subgroups. With homogenous process streams the largest component of variation is piece to piece. So the rational subgroup is to sample several sequential pieces (WITHIN) in each subgroup. Then wait some time (BETWEEN subgroup) and sample the same number of sequential pieces. This is the condition that you described so succinctly.

If you have a non homogenous process stream: out of statistical control, with one or more systemic 'natural' sources of variation such as Bob's now infamous tool wear, natural seasonal effects, large measurement error, largest component of variation is the within piece or lot to lot variation, etc. we will not have 'statistical control' in the standard subgrouping scheme even when our process is perfectly stable and predictable and even capable. We need to change our subgouping scheme to be rational if we want to apply a control chart type monitor. This will result in a more complex set of components of variation. And it is this real world complexity that many new or casual users of SPC adn Capability indexes don't comprehend and thus they can't understand their results because they don't look like the text book examples that they learned...

P

#### puck1263

Still trying to understand why mini-tab's between/within stdev is different than the overall stdev, and what is the point of reporting both?

Perhaps in method of calculation for stdev estimation?

Also, still trying to understand why within and between stdev would go down, but overall go up.
Is this also an estimation issue? Not sure how to get confidence limit for each calculation.

#### Bev D

##### Heretical Statistician
Super Moderator
Still trying to understand why mini-tab's between/within stdev is different than the overall stdev, and what is the point of reporting both?

Perhaps in method of calculation for stdev estimation?

Also, still trying to understand why within and between stdev would go down, but overall go up.
Is this also an estimation issue? Not sure how to get confidence limit for each calculation.

Sorry. I don't use Minitab and this looks like a "minitab" thing or perhaps there was an error in your data entry into Minitab as your thumbnail results simply dont' make sense to me. My analysis of your posted data shows that both within and between go down for the 'after' period when compared to the 'before' period. The total standard deviations and Ppk values follow as well. I have no idea what "between/within" is all about. It is not a common statistic adn may be a "minitab thing"?

The feedback you have received so far addresses your process and it's improvement and capability is valuable and useful and it is independent of anything minitab may be doing...If you still have issues with Minitab you will need a minitab expert to answer your questions. any Minitab people out there?