Could Ppk (Overall capability) be lower than the lowest Cpk (sample capability)?

notneo

Registered
I relatively new to capability analysis and finding that for some of my process parameters, the overall capability (Ppk) is lower that the lowest sample capability (Cpk).
Intuitively this does not make sense to me but wondering if this is possible, at least mathematically?
I would like to understand how Ppk could be lower than the lowest Cpk.
In my case the capability calculations are cross-checked between JMP (SAS) and own script.
Also, I am using standard outlier filtering (Q1/Q3 +/- 1.5xIQR based) before analysing the data for capability.
 

Miner

Forum Moderator
Leader
Admin
The Pp/Ppk indices are based on the longer term overall variation, which includes between subgroup plus within subgroup variation. The Cp/Cpk indices are based on within subgroup variation only. Therefore, Pp/Ppk should always be lower than their corresponding Cp/Cpk indices.

On a side note, I do not recommend filtering your data. You should be plotting it on the appropriate control chart and verifying that the process is stable (i.e., in a state of statistical control).
 

notneo

Registered
Thanks. For most of my processes, I find the Ppk to be greater than the lowest sample Cpk.
Only for some processes, the Ppk is the lower than the sample Cpk.
Picture below intuitively tells me that the overall capability (Ppk) cannot be lower than the lowest sample Cpk.
Could Ppk (Overall capability) be lower than the lowest Cpk (sample capability)?

The Pp/Ppk indices are based on the longer term overall variation, which includes between subgroup plus within subgroup variation. The Cp/Cpk indices are based on within subgroup variation only. Therefore, Pp/Ppk should always be lower than their corresponding Cp/Cpk indices.

On a side note, I do not recommend filtering your data. You should be plotting it on the appropriate control chart and verifying that the process is stable (i.e., in a state of statistical control).
 

Bev D

Heretical Statistician
Leader
Super Moderator
In my case the capability calculations are cross-checked between JMP (SAS) and own script.
Also, I am using standard outlier filtering (Q1/Q3 +/- 1.5xIQR based) before analysing the data for capability.
It’s not about the math - it’s about how you sample, and how many data points you ahve in your calculation.

Secondly you should NEVER censor (filter) your data. The only time you remove data is when it is not real or is invalid. An impossible value due to a typo or mis-test. (Outlier tests are only warnings to look at understand your data in case their is an invalid result; they are not an excuse for throwing out data you don’t like). If not an invalid result then the data actually happened and will actually occur in the future and MUST be included in any capability calculation.

Miner is correct that you should also plot your data on a control chart (although I recommend plotting on a multi-vari chart first as it easier to see the within and between subgroup variation, then plot on an appropriate control chart).
 

Miner

Forum Moderator
Leader
Admin
Thanks. For most of my processes, I find the Ppk to be greater than the lowest sample Cpk.
Only for some processes, the Ppk is the lower than the sample Cpk.
Picture below intuitively tells me that the overall capability (Ppk) cannot be lower than the lowest sample Cpk.
View attachment 30873
Both Pp and Cp take the tolerance width as the numerator and take the variation as the denominator. Long term variation is larger than short term variation, so Pp will always be smaller than Cp. If your process were stable, the distance of the mean to the nearest specification would be constant, and the same would hold true for Cpk and Ppk. However, your process appears to be very unstable, so on some days the distance of the mean from the nearest spec might be sufficient to override the increased long term variation. In any case, your capability indices are meaningless since you process is very unstable and these indices are unrepeatable. Get the process under a state of control before calculating the capability.
 

Bev D

Heretical Statistician
Leader
Super Moderator
OR the process is very stable but it is naturally non homogenous with lot to lot variation larger than within lot variation. See it all of the time.

The issue here might be that Op is calculating Cpk and Ppk with too few samples. Both should be calculated only over a long period of time not for every lot or day.
 

notneo

Registered
I would like to get some more insight on why one should not filter outliers before analysing the data for capability.
I was, perhaps due to the lack of my knowledge, assuming that outliers would affect the standard deviation and the capability may not be representative with data including outliers.
Please could I request some more insight on this.

It’s not about the math - it’s about how you sample, and how many data points you ahve in your calculation.

Secondly you should NEVER censor (filter) your data. The only time you remove data is when it is not real or is invalid. An impossible value due to a typo or mis-test. (Outlier tests are only warnings to look at understand your data in case their is an invalid result; they are not an excuse for throwing out data you don’t like). If not an invalid result then the data actually happened and will actually occur in the future and MUST be included in any capability calculation.

Miner is correct that you should also plot your data on a control chart (although I recommend plotting on a multi-vari chart first as it easier to see the within and between subgroup variation, then plot on an appropriate control chart).
I think in our case lot to lot variation is larger than overall variation, will dig deeper into this
OR the process is very stable but it is naturally non homogenous with lot to lot variation larger than within lot variation. See it all of the time.

The issue here might be that Op is calculating Cpk and Ppk with too few samples. Both should be calculated only over a long period of time not for every lot or day.
 

Bev D

Heretical Statistician
Leader
Super Moderator
It would be helpful if you posted a data set for us to look at. We are experts at this sort of thing and can help you to understand what is happening…

As for outliers - it is as I have said:
First outlier tests are intended to signal that there ‘may’ be an invalid test result. Either a typo (the values expected are 1-2 but someone typed in an 11) or an invalid test result. In other words the data is not real. You must validate the ‘outlier’ result is invalid before you remove the data point. This is quite rare. But if there is an invalid (impossible) result it should be removed.

Most outlier tests simply detect extreme values that are real - they are actual results. They will happen again in the future and they must be included in any capability analysis or else you are cheating and lying to yourself about the capability of the process.

Do you now understand? Or do you have other questions regarding outlier tests?
 

Miner

Forum Moderator
Leader
Admin
Is that because it is considered proprietary, or because you cannot find an option to upload it? If the former, can you add/subtract a constant to all the data so it anonymizes the data? If the latter, once you make a total of 5 posts, you will be able to upload files.
 
Top Bottom