Could Ppk (Overall capability) be lower than the lowest Cpk (sample capability)?

notneo

Registered
Is that because it is considered proprietary, or because you cannot find an option to upload it? If the former, can you add/subtract a constant to all the data so it anonymizes the data? If the latter, once you make a total of 5 posts, you will be able to upload files.
Its the former. I will prepare the data and try to upload.
 

Bev D

Heretical Statistician
Leader
Super Moderator
Here is a bit more in-depth information on why “outlier tests” are not appropriate for capability studies of any kind. This is posted for you and other lurkers who will no doubt read this thread no or in the future…

Remember that the aforementioned mentioned reason for outlier detection is to ‘signal’ that there may be (or may not be) an invalid result that is not real. If you can validate thru scientific knowledge and logic that the result is a typo, mis-reading, invalid test, impossible result etc. then that data point may be censored. if the result is real (it actually happened) you cannot censor it as it will occur in the future. remember too that most outlier detection formulas use 95% ‘fences’. This means that if you have 20 values, 5 of them will be beyond the fences, if you have a bell shaped distribution. If you have a skewed or uniform type distribution the number of values beyond the fences will be even greater, since most Outlier detection formulas are based on a theoretically Normal distribution. If your process actually looks like the diagram you posted (lot to lot variation being much larger than within lot variation ) you have a near uniform distribution. The outlier detection is only detecting values that are in fact real and ‘out there’.

Capability calculations are not intended to be done until the process in question is shown to be stable and predictable. Capability calculations on an unpredictable process are meaningless. We determine stability and predictability using an appropriate control chart. When the control chart indicates an assignable - and correctable* - cause, the process is corrected to eliminate or control the assignable cause. Some people incorrectly refer to the control chart rules as ‘outlier detection’ or directly substitute outlier detection math (such as the default JMP outlier detection formula you have used) for control limits. This is a misuse of outlier detection math and a complete non-understanding of Control Charts and how they work.

Once the control chart demonstrates stability and predictability it is time for a capability (Cpk, short term capability) and Ppk (long term / performance ‘capability’) calculations. This is also why calculating capability indices for each lot or day is plain stupid and misleading. Think about it…

If you want to know a little more I suggest reading my paper on Statistical Alchemy in the resources section…

Control Charts and other quality engineering methods are not simple or easy. They are as complex as any other engineering or scientific discipline. It takes time to study and understand to become knowledgeable and proficient in their use. It is not as simple as reading something a google search turned up and then throwing some data into a software package and thinking the software will churn out the right answer.


*some assignable causes are inherent in a process and must be dealt with differently than simple ‘elimination’. For example tools wear that is stable and protectable but shows a definite trend as the tool wears. This is an assignable cause as we know it is tool wear but changing out tools at the fist sign of wear is often untenable. So we allow the wear to continue until it reaches a certain point before it makes out-of-spec parts. This is control of an assignable cause rather than elimination.
 

Semoi

Involved In Discussions
One possible explanation for Pp < Cp is that your dataset contains "short term" anti-correlations. This inflates the SD_within. A second explanation are "small" sample sizes: The uncertainty of Cpk and Ppk are "large" and we should not over-interpret the calculates estimates.

As you have correctly stated, Pp < Cp is an unusual result, but it is mathematically allowed. Just check out the example I added.
 

Attachments

  • SD_within_exceeds_global.tar
    6.5 KB · Views: 20

Miner

Forum Moderator
Leader
Admin
This looks like a problem with rational subgrouping. 19 of 20 points all within 1 SD of the center line. This fails the diagnostic test for rational subgroups. Plus, it looks nothing at all like what the OP posted.

1725996532377.png

1725996547862.png
 

Bev D

Heretical Statistician
Leader
Super Moderator
Maybe the OP gave us (1) their censored data and(2) a data set that isn’t the same as what they drew a diagram of OR (3) this an I MR chart /data of the subgroup means. ???
 

Bev D

Heretical Statistician
Leader
Super Moderator
So what am I missing? PpCpk is always less than Cp/Cpk. Just look at the damn formula. The only way that Ppk is greater than Cpk is a rare case of irrational subgrouping and sampling frequency.

Also re-reading the original post I think the OP is saying that they are calculating Cpk for every subgroup? This is wrong, so I think the OP needs to clarify this.
 

notneo

Registered
I agree I owe some clarification
  1. As stated in the OP, I am relatively new to capability analysis, so please excuse my mistakes. I decided to post in this forum as discussions here appear to made most sense to me while I am learning.
  2. The diagram in my second post is from Minitab webpage titled (A Simple Guide to Between / Within Capability, cant post links yet) and not from our data. This was only to show graphically what I was trying to ask.
  3. Now coming back to my question. Could overall Ppk be lower than lowest sample capability (not overall Cpk)? I agree, calculating sample/subgroup capability is meaningless. but I did it for my own understanding. If I do not ask the question to myself I do not learn.
  4. Upon further analysis, I find that the cases for which I get overall Ppk lower than the lowest sample capability is when the process is not stable. For this cases I get the stability index (ratio of overall to sample sigma) >>1. All analysis done in JMP.
  5. Regarding, outlier filtering before capability analysis, I was asked to do this. My guess is to exclude test artefacts but I will find out why again and get back.
 

Bev D

Heretical Statistician
Leader
Super Moderator

Semoi

Involved In Discussions
@Miner: The posted dataset is fake data. I generated it to prove the point that if the within-subgroup data is anti-correlated the SD_within is expected to be smaller than SD_global -- assuming that the subgroups are independent. The fact that we get many subgroup averages within +/-1 Sigma is a direct consequence of this anti-correlation. However, if we see this as a short-coming, it is easily possible to adapt the generation process to obtain the following dataset:

Bildschirmfoto 2024-09-12 um 12.46.54.png
Bildschirmfoto 2024-09-12 um 12.47.09.png


The (overall) within-subgroup standard deviation is 1.60, and the global standard deviation is 1.41. Also 11 of the 20 subgroup SD exceed the global SD.

PpCpk is always less than Cp/Cpk. Just look at the damn formula. The only way that Ppk is greater than Cpk is a rare case of irrational subgrouping and sampling frequency.
Saying that the performance indices exceed the capability indices due to their mathematical formulas, and then saying that there are "rare cases" of exceptions is a rather ... let's say weird ... statement. Mathematics does not work this way. Furthermore, expressions such as "Var[total] = Var[within] + Var[between]" do not apply to the case I described.
Bev, this is not the first time you choose a rather harsh statement. However, I am happy to repeat myself: I am here to learn. So please, post the formulas and the assumptions -- this would certainly be helpful. However, what is not (!) helpful is to post a reference (probably to a Donald Wheeler paper) stating that the subject is well explained there, and that you don't want to repeating it here. Please make your point clear.

Bev, I agree with you, if you said that in 99% of all cases the problem of Cpk > Ppk is due to
irrational subgrouping and sampling frequency.
However, anti-correlated subgroup data points are rare in industry, and if the anti-correlation is mild enough, we won't be able to detect it. E.g. in my second dataset we have corr=-16.5%, but the standard correlation test yields p=10.1%. Hence, it is not detectable.

Also re-reading the original post I think the OP is saying that they are calculating Cpk for every subgroup?
Excellent point. I missed that. So let's go back to the original post:

Intuitively this does not make sense to me but wondering if this is possible, at least mathematically?
I would like to understand how Ppk could be lower than the lowest Cpk.
I gave you one "reason" in my first post: If the within-subgroup data points are anti-correlated, we expect that SD_within is inflated. Hence, Cpk is expected to exceed Ppk -- if the between sample variance is "small". Furthermore, if it is possible for the overall Cpk to exceed the overall Ppk, then it is also possible for each (individual subgroup) Cpk to exceed the overall Ppk. I even generated a dataset. However, the comparison between the "individual subgroup Cpk" values and the overall Ppk hardly makes sense. So, while there is nothing mathematical, which prevents it from happening, Cpk_each > Ppk_overall is very unusual. Thus, you should investigate such a process.
 
Top Bottom