Process capability in a single batch

Eliud Kipchoge

Registered
Hello,

Excuse me for asking, but I have an issue that is driving me crazy. I have a single batch, with 110k datapoints on it. These datapoints are automatically recorded and have both good and bad products. Bad products get discarded inmediately afterwards, and therefore, do not continue in the process.

We usually calculate process capability parameters for each batch, filtering only for the good products, using the reasoning that a bad product is 'only' a yield problem. Is this approach correct?

To calculate the capability, we use Cp and Cpk. It is only one batch, but with over 100k datapoints (around 9k are gone after removing the bad products). I understand that Cp and Cpk both use sigma (in other words, population standard deviation, or the one with N in the denominator). I also know that Pp and Ppk (which we usually do not calculate) use s (i.e, sample standard deviation, the one with N-1 in the denominator).

So far, for each batch, we have always used Cp and Cpk. Therefore, we calculated the parameters with sigma (also known as OVERALL on Minitab software). Is it acceptable to use Cp and Cpk in our case (for single batch, but with a lot of datapoints) or should we use Pp and Ppk for each individually batch (and Cp/cpk when?). If Pp and Ppk, what would be my subgroup size?

There is a final twist (that is the bane of my existance...). The machine prints out a report for the batch. This report states both specification limits, mean of the datapoints, standard deviation, RSD% (relative standard deviation %) and Cp and Cpk (named exactly like this, Cp and Cpk, not Pp(k)). Specs limits and mean are ok. On "standard deviation" they calculate sigma (population), which I agree with (let's suppose the value is 4).

But for RSD% calculation, they do not use 4 (population std dev), they use 3.26, which is the value Minitab calculates for sample standard deviation (taking subgroup size as 1, which I do not understand...). And for Cp and Cpk they also use again the sample standard deviation. If Pp and Ppk should be calculated for this, then the report should show s and not sigma. But if Cp and Cpk should be calculated, then the parameters are plain wrong! Which situation is it?

Sorry for the sea of doubts. Any help or insights would be highly appreciated. Thanks a lot!!
 

Bev D

Heretical Statistician
Leader
Super Moderator
There are so many incorrect things in this post that i don’t know where to begin. No, actually I do. You are doing the right thing by coming here to ask questions. Let me start with the most important things:
1. Process capability indices are all but useless. They should be avoided at all costs. They may be used by a lot of people but so is THIS IS SPAM - PLEASE REPORT THIS POST.
2. It is completely stupid to calculate a process capability number for a single batch. Capability indices are intended for whole processes where 100% data is not available.
3. You NEVER exclude the out of spec parts in the calculation. If you do you are only reporting the distribution of the shipped batch of material which is not the capability of the process at all…
4. The concern with which formula to use is misplaced. With your sample size there is no real difference.
5. In general Cpk is intended to use the within subgroup sample standard deviation which should have n-1 in the denominator. Ppk is the overall capability index which uses all of the data in the (global) SD. Sometimes software - and a misguided automotive approach for pre-production calculations reversed these indices. In the very beginning there was only Cpk where the global SD was used BUT there was no prediction of yield or translation to defect rate or reliability of a Normal distribution.
6. I will let Miner or someone else untangle the math in the end of your post but it is irrelevant in the end because of 1, 2 3, 4 and 5 above.

For references and a longer explanation see my article on “statistical Alchemy“ in the Resources section. There is simply no way to have a valuable discussion regarding this topic in a simple thread…
 

Eliud Kipchoge

Registered
First and foremost, thank you for your speedy reply.

1. Process capability indices are all but useless. They should be avoided at all costs. They may be used by a lot of people but so is THIS IS SPAM - PLEASE REPORT THIS POST.

I do not like them due to the fact that the number itself does not allow me to improve my process, but due to my current job situation I'm forced to calculate and report them. This is why I'm trying to learn more.

2. It is completely stupid to calculate a process capability number for a single batch. Capability indices are intended for whole processes where 100% data is not available.

Please allow me to rephrase. I calculate process capability for a single operation (in this case, filling) but not for the whole process itself (i.e., not combining mixing + filling steps). Only that operation. Is it bad to use process capability in this case?

3. You NEVER exclude the out of spec parts in the calculation. If you do you are only reporting the distribution of the shipped batch of material which is not the capability of the process at all…

In the original 110k datapoints, I have several that are machine errors during set-up (values that are 999999 artificially). These I remove because they are noise and fake data.

There are others, not many, that happen to be 0 or -1. Those are really not filled (scale before and after the filler measures the same). These get automatically rejected and I also reject those. If I calculate any process parameter with those values, I get literally 0.05 and -0.05 values for my process. Also, mean is way lower than it should be.

Lastly, there are datapoints that have the correct amount of filled liquid, but are missing stoppers, which is a "subproduct" of another operation (stoppers loading). I also discarded this, because it is not related to the filling process itself. This point I think can be controversial or might need further explanation. Do you think that, with the out of spec parts, should stay in the data?

4. The concern with which formula to use is misplaced. With your sample size there is no real difference.

I feel like this is true indeed. Also sample size is why I don't bother with normality (data is not normal, which worries me much more than anything else..)

5. In general Cpk is intended to use the within subgroup sample standard deviation which should have n-1 in the denominator. Ppk is the overall capability index which uses all of the data in the (global) SD. Sometimes software - and a misguided automotive approach for pre-production calculations reversed these indices. In the very beginning there was only Cpk where the global SD was used BUT there was no prediction of yield or translation to defect rate or reliability of a Normal distribution.

I must have been missguided then. I thought Cp(k) used population SD, not the other way around. Perhaps I read information about the beginnings of the parameters... In this line, is using MR/d2 a better alternative than using sample/global SD?

For references and a longer explanation see my article on “statistical Alchemy“ in the Resources section. There is simply no way to have a valuable discussion regarding this topic in a simple thread…

This one, right?
elsmar.com/elsmarqualityforum/resources/statistical-alchemy.70/

Will look at it thoroughly. Again thank you for your knowledge!!
 

Bev D

Heretical Statistician
Leader
Super Moderator
I will only comment on the Normality statement. Normal distributions are not normal…in a filing process I would expect it to be a truncated distribution.
 
Top Bottom