Cp/Cpk vs Pp/Ppk (Short term using population sigma) - Formulas to use?

LoudRed · Jan 7, 2002

OK, Now I'm totally confused about Cp/Cpk, Pp/Ppk.

What I'm messed up on is what formulas to use, and what symbol is for which method.

I know there are the two regular ways to calculate standard deviation. That is, using the "n" denominator for population SD, and "n-1" for sample SD. I also know there is a Sigma (Rbar/d2), and a Sigma-hat. Do either of these correspond to the two regular ways to calculate either Cp/Cpk or Pp/Ppk? If not then what are the formulas.

Also of interest to me concerns individual data and moving range.
Most of the data we've collected are individual data and not multiple readings for sub-group data. Below I've listed individual data for 30 pieces. At the point these data were taken, it was a population. I've attached an EXCEL file with all the data if you want to look at it along with my formulas. Please note that if you do, I only used the "mean - LSL" for my Cpk/Ppk formulas as all the data shows it to be in the bottom half of the spec.

L L Rng
1 79.4
2 79.9 0.5
3 80.7 0.8
4 81.2 0.5
5 81.2 0.0
6 82.5 1.3
7 81.2 1.3
8 80.5 0.7
9 81.5 1.0
10 80.7 0.8
11 80 0.7
12 81 1.0
13 80.3 0.7
14 80.7 0.4
15 80.9 0.2
16 80.7 0.2
17 81 0.3
18 81.7 0.7
19 82.2 0.5
20 81 1.2
21 81 0.0
22 80.8 0.2
23 81 0.2
24 79 2.0
25 83.5 4.5
26 80.2 3.3
27 80.4 0.2
28 79.8 0.6
29 80.2 0.4
30 78 2.2
Min 78.00 0
Max 83.50 4.5
Mean 80.74 0.91
Median 80.75

USL 100
LSL 70

Std. Dev (population) 1.013
Std. Dev (sample) 1.030
Std. Dev. Rbar/d2 0.455

Cp (Rbar/d2) 10.985
Cp (pop) 4.937
Pp 4.854
Cpk (Rbar/d2) 7.865
Cpk (pop) 3.535
Ppk 3.475

First of all, did I calculate using the correct formulas? If so, I don't understand why to use the SD (sample) for Pp/Ppk when that is taken from a population. I've also grouped the data into 15 subsets of 2, and 10 subsets of 3 and gotten the following results. Are these correct also?

If I've used all the correct formulas, then what I got for Cp/Cpk should use the (Rbar/d2) formulas, while thePp/Ppk uses the population SD (ie... "n" in the denominator). Please advise either way. For any assistance.....

Jim Biz · Jan 8, 2002

Bump-up for out stat gurus:

My memory is really fuzzy here - I do have a software program that figures ppk for me but our customers still rfely on cpk for information so I seldom use it.

Can anyone else give a clear explaination??

Atul Khandekar · Jan 8, 2002

Cp/Cpk...

Cp/Cpk use sigma calculated by the RBar/d2 formula.
For Ppk, sigma is calculated using the formula:

Cp/Cpk vs Pp/Ppk (Short term using population sigma) - Formulas to use?

The difference is the way sigma is calculated.

- Ppk attempts to answer the question "does my current production sample meet specification ?" Process performance indices should only be used when statistical control cannot be evaluated.
-On the other hand, Cpk attempts to answer the question "does my process in the long run meet specification?" Process capability evaluation can only be done after the process is brought into statistical control. The reason is simple: Cpk is a prediction, and one can only predict something that is stable.

You can get more details in the article 'Measuring Your Process Capability' at https://www.symphonytech.com/articles/processcapability.htm

-Atul.

Al Dyer · Jan 8, 2002

Cpk: Long term using Rbar d2

Ppk: Short term using population sigma

Dave Strouse · Jan 8, 2002

UGHHHHHHHHHHHH!!!

This raises some of the most troubling questions around.
I'll take a first cut at some of them.

First of all, I'd like to find every author who ever published anything about a "population" and "sample" standard deviation, sit them in an electric chair and gleefully pull the handle. They really don't have a whole lot to do with populations and samples. These are merely statistics used to describe the data.

The reasons there are two main expressions is that one is a "biased estimator" and the other is not. If you want to know what that means, be prepared to study at least one or two semesters of mathematical statistics and even then you may be confused.
DON'T GO THERE! :frust:

Let's get real in the application sense, "How much real practical difference is there between using a standard deviation with n in the denominator versus one with (n-1)" With thirty points you are under 2% in disagreement. Will you send a rocket off course and miss the moon by this "error" either way? NO, and in fact you're daily production won't be affected in any way. My advice in industrial situations is to always use the (n-1) formula.

Another point raised is Cpk versus Ppk. The intent is to show long term "capability" versus short term.

For short term an average dispersion statistic of the "within subgroups" data needs to be used. For individual data "groups" use the standard deviation estimate given by average moving range adjusted by the control chart factors. When you have actual subgroups use standard deviation estimate given by range adjusted by the control chart factors.

For long term a dispersion statistic of all data needs to be used to capture the variability "betwen subgroups". For individual data "groups" or actual groups use the standard deviation estimate given by the "sample" standard deviation.

Davis Bothe has published an 800 plus page book on process capability and the whole concept is confusing. There are as many capability "indices" out there as there are processes it seems.

You should probably ask yourself real hard questions like "What do I intend to do with these indices?"

IMHO they are often used to satisfy artificial reporting requirements. In this sense they allow the clueless to keep the blind informed. :biglaugh:

A proper use of Cpk etc in my opinion would be to measure process improvements as they are instituted. Using these measures in a comparitive sense is perhaps usefull.

One place I worked took Cpk measurements of 5 to 10 different assembly lines in each plant and averaged the Cpks. Trust me that this is mathematically incorrect. Yet, when I tried to make management aware of this, I was told that they knew it was wrong, but upper management wanted a single number to evaluate! :bonk:

BTW, there is a way to combine different processes Cpk into an overall Cpk, but it is not by the simple average.

As far as grouping the data together, what rational basis have you to do this? Again, I think you want to know what you are doing with the answers before you just crunch numbers.

Finally, I don't know what your process is, but I doubt if thirty points is anywhere near enough to estimate long term variability. I'm moving my office to another building so don't have any references available, but I think the common practice is to have a minimum of 100 points to talk about long term capability. Of course this number will depend on if you feel you have captured the variability long term of the process or not.

MarkR · Jan 10, 2002

Cp/Cpk vs. Pp/Ppk

If your data comes from a control chart and represents a reasonable amount of production time (I agree that about 100 data points is good) then you always use Cp/Cpk. The d2/Rbar estimator factors out the long term variation by using Rbar. Rbar is based on differences within subgroups, i.e., short term variability.

If your data comes from a short term study, where you went into the shop and gathered consecutive parts from the process, then always use Pp/Ppk. Consecutive parts represent only short term variation.

In summary:

Data from control charts uses Cp/Cpk
Data from short term studies uses Pp/Ppk

AJLenarz · Jan 10, 2002

CPK vs. PPK

I have always enjoyed the "QS" spin on the definations. To me it just made so much sense makes this whole thin cut and dry.

Cpk – The capability index for a stable process. The estimate of sigma is based on within subgroup variation. Cpk can only be calculated when the process is stable.

Ppk – The performance index. The estimate of sigma is based on total variation. Ppk is to be calculated if less than 100 samples or when the process is chronically unstable but meeting the specifications and in a predictable pattern.

:bonk:

LoudRed · Jan 11, 2002

My thanks to all of you who answered. I appreciate the help.

I sometimes feel that I could really relate to the one statement about the clueless leading the blind..

Again, thanks for the help everybody.

KenK - 2009 · Jan 23, 2002

I would tend to see AJLenarz's description of Cpk and Ppk as being the most "usable". In my own words:

Ppk is the actual PERFORMANCE of your process, incorporating all observed variation.

Cpk is CAPABILITY of your process IF all instability was removed (or ignored).

Cp is the best your current process could do IF all instability was removed AND it was centered.

AJ mentioned that Cpk can only be used if the process is stable. I'm not sure what that means. If it means that your control charts shouldn't be "alarming" then I agree, but there will always be some amount of apparent instability in the process - variation between subgroups.

When I say "apparent" I mean that what looks like instability may really be just random between subgroup variation. That's what the control chart is supposed to help separate.

In general I recommend people calculate and report both Cpk AND Ppk since they mean two different things and both provide information.

I also strongly recommend against only reporting only Cpk to customers (unless they specifically ask for just Cpk). In my mind, reporting only Cpk is sort of cheating - making the customer think the process is more capable than it really is by ignoring between-subgroup variation.

By the way, I've never felt comfortable about the terms "short-term variation" and "long-term variation" since it seems they can easily be misunderstood. I prefer the terms "within subgroup variation" and "overall variation".

MINITAB users may have noticed that release 12 used short-term & long-term when referring to the difference variance forms, but release 13 switched to using within and overall instead. I applaud that change.

Rick Goodson · Jan 23, 2002

Interesting 'discussion' so far. Let me see if I can stir up the pot.

Why are we interested in process capability? Under the Deming philosophy of never-ending quality improvement it would be to seek methods to continually reduce '6 sigma'. In the automotive arena (read AIAG) they have the same basic philosophy. From the AIAG SPC manual page 1 "First, gathering data and using statistical methods to interpret them are not ends in themselves. The overall aim should be to increase understanding of the reader's processes. It is very easy to become technical experts without realizing any improvements. Increased knowledge should become a basis for action". Now with that said (and I am sure there will be some divergent opinions on the veracity of that statement) the reason for using Cpk or Ppk can be discussed.

The difference between the two indices lies in the denominator, 6 sigma hat sub R-bar/d2 versus 6 sigma hat sub s (reference AIAG SPC page 80). Please note that the term 'hat' is used in both formula. Statistically speaking hat means an estimate therefore it is not based on the whole population only a sample. As Dave Strouse pointed out this population/sample thing is just a red herring that confuses people. It all has to do with how 'sigma' is calculated. Even AIAG trys to confuse people with the terminology. Never the less...

PROCESS CAPABILITY is defined as the 6 sigma range of a process's inherent variation, where sigma is usually estimated by R-bar/d2 and where inherent variation is defined as that portion of process variation due to common causes only (reference page 79 & 80). PROCESS PERFORMANCE is defined as the 6 sigma range of a process's total variation where sigma is usually estimated by s, the sample standard deviation (reference page 79 & 80). So...

Process capability is Cpk, Process performance is Ppk.

Process capability is an idealistic state assuming that all variation is due to common cause only and the process is centered. It is measured by taking variation measurements over TIME from a process that is statistically stable (only common cause variation present).

Process performance is the actual state of the process at some moment in time. In essence a snap shot of the process now. In a minute, hour, day, or week later it probably will be different.

Cpk is an historical record of the processes used as a predictor of the future. Ppk is how the process is actually performing at the time you made the measurements.

Regards,

Rick

Cp/Cpk vs Pp/Ppk (Short term using population sigma) - Formulas to use?

LoudRed

Attachments

Jim Biz

Atul Khandekar

Attachments

Al Dyer

Dave Strouse

MarkR

AJLenarz

LoudRed

KenK - 2009

Rick Goodson

Similar threads