Cp/Cpk vs Pp/Ppk (Short term using population sigma) - Formulas to use?
OK, Now I'm totally confused about Cp/Cpk, Pp/Ppk. What I'm messed up on is what formulas to use, and what symbol is for which method.
I know there are the two regular ways to calculate standard deviation. That is, using the "n" denominator for population SD, and "n-1" for sample SD. I also know there is a Sigma (Rbar/d2), and a Sigma-hat. Do either of these correspond to the two regular ways to calculate either Cp/Cpk or Pp/Ppk? If not then what are the formulas.
Also of interest to me concerns individual data and moving range.
Most of the data we've collected are individual data and not multiple readings for sub-group data. Below I've listed individual data for 30 pieces. At the point these data were taken, it was a population. I've attached an EXCEL file with all the data if you want to look at it along with my formulas. Please note that if you do, I only used the "mean - LSL" for my Cpk/Ppk formulas as all the data shows it to be in the bottom half of the spec.
First of all, did I calculate using the correct formulas? If so, I don't understand why to use the SD (sample) for Pp/Ppk when that is taken from a population. I've also grouped the data into 15 subsets of 2, and 10 subsets of 3 and gotten the following results. Are these correct also?
If I've used all the correct formulas, then what I got for Cp/Cpk should use the (Rbar/d2) formulas, while thePp/Ppk uses the population SD (ie... "n" in the denominator). Please advise either way. For any assistance.....
My memory is really fuzzy here - I do have a software program that figures ppk for me but our customers still rfely on cpk for information so I seldom use it.
Can anyone else give a clear explaination??
__________________
Regards
Jim
"Chance is a word void of sense; nothing can exist without a cause."
Voltaire
Cp/Cpk use sigma calculated by the RBar/d2 formula.
For Ppk, sigma is calculated using the formula:
The difference is the way sigma is calculated.
- Ppk attempts to answer the question "does my current production sample meet specification ?" Process performance indices should only be used when statistical control cannot be evaluated.
-On the other hand, Cpk attempts to answer the question "does my process in the long run meet specification?" Process capability evaluation can only be done after the process is brought into statistical control. The reason is simple: Cpk is a prediction, and one can only predict something that is stable.
This raises some of the most troubling questions around.
I'll take a first cut at some of them.
First of all, I'd like to find every author who ever published anything about a "population" and "sample" standard deviation, sit them in an electric chair and gleefully pull the handle. They really don't have a whole lot to do with populations and samples. These are merely statistics used to describe the data.
The reasons there are two main expressions is that one is a "biased estimator" and the other is not. If you want to know what that means, be prepared to study at least one or two semesters of mathematical statistics and even then you may be confused.
DON'T GO THERE!
Let's get real in the application sense, "How much real practical difference is there between using a standard deviation with n in the denominator versus one with (n-1)" With thirty points you are under 2% in disagreement. Will you send a rocket off course and miss the moon by this "error" either way? NO, and in fact you're daily production won't be affected in any way. My advice in industrial situations is to always use the (n-1) formula.
Another point raised is Cpk versus Ppk. The intent is to show long term "capability" versus short term.
For short term an average dispersion statistic of the "within subgroups" data needs to be used. For individual data "groups" use the standard deviation estimate given by average moving range adjusted by the control chart factors. When you have actual subgroups use standard deviation estimate given by range adjusted by the control chart factors.
For long term a dispersion statistic of all data needs to be used to capture the variability "betwen subgroups". For individual data "groups" or actual groups use the standard deviation estimate given by the "sample" standard deviation.
Davis Bothe has published an 800 plus page book on process capability and the whole concept is confusing. There are as many capability "indices" out there as there are processes it seems.
You should probably ask yourself real hard questions like "What do I intend to do with these indices?"
IMHO they are often used to satisfy artificial reporting requirements. In this sense they allow the clueless to keep the blind informed.
A proper use of Cpk etc in my opinion would be to measure process improvements as they are instituted. Using these measures in a comparitive sense is perhaps usefull.
One place I worked took Cpk measurements of 5 to 10 different assembly lines in each plant and averaged the Cpks. Trust me that this is mathematically incorrect. Yet, when I tried to make management aware of this, I was told that they knew it was wrong, but upper management wanted a single number to evaluate!
BTW, there is a way to combine different processes Cpk into an overall Cpk, but it is not by the simple average.
As far as grouping the data together, what rational basis have you to do this? Again, I think you want to know what you are doing with the answers before you just crunch numbers.
Finally, I don't know what your process is, but I doubt if thirty points is anywhere near enough to estimate long term variability. I'm moving my office to another building so don't have any references available, but I think the common practice is to have a minimum of 100 points to talk about long term capability. Of course this number will depend on if you feel you have captured the variability long term of the process or not.
If your data comes from a control chart and represents a reasonable amount of production time (I agree that about 100 data points is good) then you always use Cp/Cpk. The d2/Rbar estimator factors out the long term variation by using Rbar. Rbar is based on differences within subgroups, i.e., short term variability.
If your data comes from a short term study, where you went into the shop and gathered consecutive parts from the process, then always use Pp/Ppk. Consecutive parts represent only short term variation.
In summary:
Data from control charts uses Cp/Cpk
Data from short term studies uses Pp/Ppk
I have always enjoyed the "QS" spin on the definations. To me it just made so much sense makes this whole thin cut and dry.
Cpk – The capability index for a stable process. The estimate of sigma is based on within subgroup variation. Cpk can only be calculated when the process is stable.
Ppk – The performance index. The estimate of sigma is based on total variation. Ppk is to be calculated if less than 100 samples or when the process is chronically unstable but meeting the specifications and in a predictable pattern.