# Calculating Cpk on Non-Normal Data Distribution

#### d.conroy

##### Starting to get Involved
Hi All,
So I've been doing validation work for a while now so I'm resionabilly fimilar with CPK and other process indicators. However our ongoing validation has lead me into a problem.

I'm calculating a Cpk on a heat sealing process based on seal peeling strength evaluations. There is a lower limit of 1.5N/15mm strip with no upper limit.

When I look at the results there is a mean on 3.4N with a SD of 0.78, so not good; cpk of only 0.8:

But then I look at a distribution plot of the results and see that the distribution has a large tail on the positive side of the mean.

This tail is giving me a large SD which is causing a low CPK.
This dose not seem reasonable to me; the outliers/tail above the mean is making me fail the lower specification CPK.

What can I do in this situation?

If I calculate the SD only on the lower side of the mean i.e. (summation of: [3.4-{result less than 3.4}] /{n of results less than 3.4})^0.5.
I then get a SD of 0.37 and a CPK of 1.7

Is this reasonable to do? What else could I do?

Any help would be appreciated.

#### Bev D

##### Heretical Statistician
Staff member
Super Moderator
yes, this is a reasonable approach. it meets the intent of capability indexes.

#### Darius

##### Quite Involved in Discussions
I would...
Code:
``````Function DS_dwn(R As Range) As Single
Dim Median As Single
Median = Application.WorksheetFunction.Median(R)
ds1 = Median - Application.WorksheetFunction.Percentile(R, (1 - 0.683) / 2)
ds2 = (Median - Application.WorksheetFunction.Percentile(R, (1 - 0.955) / 2)) / 2
ds3 = (Median - Application.WorksheetFunction.Percentile(R, (1 - 0.997) / 2)) / 3
DS_dwn = (ds1 + ds2 + ds3) / 3
End Function``````
Maybe strange, but it was handy for me, a non-parametrical way to estimate stdev one-way (up or down, in this case is down) using the percentile of the normal distribution. I developed it to modify Chauvenet criteria for outlier estimation.

#### Darius

##### Quite Involved in Discussions
or if you want it by IQR (interquartile range)

Being DATA_RANGE the data range, the ds from the lower tail:

=(MEDIAN(DATA_RANGE)-QUARTILE(DATA_RANGE,1))/0.67448

demonstration:

By bibliographic references IQR=2*(0.67448)ds

if you take only the lower part, you would need to duplicate the Median-1Q value to simulate that the same that happens to the lower bound happends to the upper to simulate a IQR, let's call it IQR'.

so: IQR' = 2*(0.67448)ds
2*(Median-25Percentile)=2*(0.67448)ds

so ds = (Median-25Percentile)/0.67448

#### Statistical Steven

##### Statistician
Staff member
Super Moderator
Hi All,
So I've been doing validation work for a while now so I'm resionabilly fimilar with CPK and other process indicators. However our ongoing validation has lead me into a problem.

I'm calculating a Cpk on a heat sealing process based on seal peeling strength evaluations. There is a lower limit of 1.5N/15mm strip with no upper limit.

When I look at the results there is a mean on 3.4N with a SD of 0.78, so not good; cpk of only 0.8:

But then I look at a distribution plot of the results and see that the distribution has a large tail on the positive side of the mean.

This tail is giving me a large SD which is causing a low CPK.
This dose not seem reasonable to me; the outliers/tail above the mean is making me fail the lower specification CPK.

What can I do in this situation?

If I calculate the SD only on the lower side of the mean i.e. (summation of: [3.4-{result less than 3.4}] /{n of results less than 3.4})^0.5.
I then get a SD of 0.37 and a CPK of 1.7

Is this reasonable to do? What else could I do?

Any help would be appreciated.
Just to add to the fray....I am not a believer in penalizing yourself for having a long tail away from the specification. There are several approaches I use:

1. A log transformation to get the data more normal.
2. A method similar to Darius, a non-parametric Cpk metric.
3. Do a 5% winzoring of the data. That is remove the upper and lower 5% of the data. How does the Cpk change.
4. If want to do some heavy lifting, you can use non-parametric tolerance intervals when the sample size is large. I can then predict how many PPM outside the lower specification I can expect.

There are many ways to handle non-normal data for capability, but as Bev said, it's about remaining true to the intention of capability. At this point in time do I have evidence of a capable process.

#### Bev D

##### Heretical Statistician
Staff member
Super Moderator
some very good suggestions on how to handle this situation statistically. As a stats geek I find them really interesting - but only in an academic way. My practical side lets out a large sigh and says "what a lot of work for such a meaningless index".

I would ask a completely different question at this point: why do need to calculate a Cpk index at all?

#### Statistical Steven

##### Statistician
Staff member
Super Moderator
some very good suggestions on how to handle this situation statistically. As a stats geek I find them really interesting - but only in an academic way. My practical side lets out a large sigh and says "what a lot of work for such a meaningless index".

I would ask a completely different question at this point: why do need to calculate a Cpk index at all?
Bev, let me give you my perspective, though some might not agree. the use of Cpk as an acceptance criteria is an off-shoot of six sigma. Instead of understanding process variation, this method forced you to answer "How many PPM can I expect". As I tell people, you can measure Cpk today, and then again a week later and see different results. Even if you use Ppk is does not change. I asked a coworker to compare a Cpk of 2.0 with the mean was very close to the specification with low variability to a Cpk of 2.0 with the mean centered but more variability, which is better? The answer was they are the same. This is why it's a meaningless index!

#### Miner

##### Forum Moderator
Staff member
Bev, let me give you my perspective, though some might not agree. the use of Cpk as an acceptance criteria is an off-shoot of six sigma.
Not an offshoot of Six Sigma. I was seeing this from the automotive industry in the early 80's. Six Sigma didn't gain much attention outside Motorola and the early adopters until the mid 90's.

#### Bev D

##### Heretical Statistician
Staff member
Super Moderator
Miner is correct that Cpk came over from the Japanese in the mid 80s (Sullivan, L. P., "Reducing Variability: A New Approach to Quality", Quality Progress, July 1984)

The automotive industry unnecessarily complicated the original concept by introducing the concept of short and long term variation (Cpk & Ppk)

Steven is correct in that some Six Sigma folks started to corrupt Cpk with a bunch of nonsense about:
• being able to predict very low defect rates based on the tails of the Normal distribution
• espousing that the difference between the long term and short term capability represented the entitlement of the process; preaching that assignable causes are easy to eliminate and that common causes are not.
• and then came the 1.5 sigma shift abomination
• and then insisting that everyone had to have Cpk values for every characteristic or they weren't really 'doing' six sigma...

===========================
if you are interested in the defect rate (for cost and yield impacts to delivery, or fro understanding the inspection needs) then you are far better off counting the number of defects that actually occur. (Pyzdek, Thomas, "Why Normal Distributions Aren't [All That Normal]", Quality Engineering 1995, 7(4), pp. 769-777)

If you are interested in the overall variation of the process in relation to the specifications, you are better off plotting the process results in a multi-vari chart and comparing to the specification limits. (it is insensible to reduce variation to a single number)

If you are interested in the future stability of the process you are better off plotting the process in a multi-vari and/or control chart and understanding the science then applying the appropriate controls to maintain stability and detect changes quickly. (statistics doesn't obviate science)

Cpk (and all of its illegitimate children) simply can't do any of the above for you...

A Calculating Cp and Cpk on a Non-Normal Distribution Capability, Accuracy and Stability - Processes, Machines, etc. 14
Calculating LCL, UCL, Cp, and Cpk in an Excel Spreadsheet Statistical Analysis Tools, Techniques and SPC 3
M Calculating Cpk when sample size equals to 1 Capability, Accuracy and Stability - Processes, Machines, etc. 12
B Please share a template for calculating Cp Cpk Document Control Systems, Procedures, Forms and Templates 3
M Calculating Cp and Cpk - Is my calculation correct? Capability, Accuracy and Stability - Processes, Machines, etc. 7
M Calculating correct Cpk - Getting different answers Capability, Accuracy and Stability - Processes, Machines, etc. 3
K Calculating StdDev and Cpk on Multiple Populations Capability, Accuracy and Stability - Processes, Machines, etc. 2
D Calculating Cpk for Tubing Wall Thickness - Extrusion Process - How to approach? Statistical Analysis Tools, Techniques and SPC 16
D Use of an outlier in calculating Cpk/Ppk Capability, Accuracy and Stability - Processes, Machines, etc. 8
Calculating Cpk/Ppk Simultaneously in SPC software Capability, Accuracy and Stability - Processes, Machines, etc. 6
L Calculating Cpk on a single sided (unilateral) tolerance Capability, Accuracy and Stability - Processes, Machines, etc. 28
C Calculating Cp and Cpk for Product with only LSL (Lower Specification Limit) Capability, Accuracy and Stability - Processes, Machines, etc. 28
L Calculating the RFT using the Cpk value Capability, Accuracy and Stability - Processes, Machines, etc. 3
Calculating Defect Rate (PPM) from a Cpk value Capability, Accuracy and Stability - Processes, Machines, etc. 8
M Cp & Cpk - How are you calculating the standard deviation? Capability, Accuracy and Stability - Processes, Machines, etc. 3
Calculating your KPIs ISO 13485:2016 - Medical Device Quality Management Systems 7
Calculating Heat Dissipation Manufacturing and Related Processes 1
Calculating Defect Rates ISO 9000, ISO 9001, and ISO 9004 Quality Management Systems Standards 4
Formula for Calculating NoGo Major Diameter for UN gages Inspection, Prints (Drawings), Testing, Sampling and Related Topics 4
Calculating a weight for a machine for the CE label CE Marking (Conformité Européene) / CB Scheme 0
Remote Support - Calculating Number of Employees IATF 16949 - Automotive Quality Systems Standard 2
Calculating Tolerance of a Panel Meter with a 2 degrees of an Arc General Measurement Device and Calibration Topics 0
Product Development - When to start calculating Process Capability Capability, Accuracy and Stability - Processes, Machines, etc. 3
Calculating Reliability for Subsystems in Series Reliability Analysis - Predictions, Testing and Standards 15
Calculating (3rd Party) Audit Days for Company with Seasonal Employees General Auditing Discussions 3
T Formulas for Calculating Coefficients for RTD / PRTD Calibration According to ITS-90 General Measurement Device and Calibration Topics 5
J Calculating part variation from historic data for GRR study Gage R&R (GR&R) and MSA (Measurement Systems Analysis) 1
P Purpose of calculating Ta, Tb & R Square in Linearity Study Gage R&R (GR&R) and MSA (Measurement Systems Analysis) 1
P Purpose of calculating Uncertainty value in calibration study Measurement Uncertainty (MU) 5
Calculating MTBF for plug-in PCI cards Reliability Analysis - Predictions, Testing and Standards 2
J Calculating impact force, expressed in lbs, of a 5 lb weight dropped 48 inches Inspection, Prints (Drawings), Testing, Sampling and Related Topics 1
K Calculating Capability of a process when data is skewed in the upper range Capability, Accuracy and Stability - Processes, Machines, etc. 4
A Calculating Accuracy for a Multimeter General Measurement Device and Calibration Topics 1
Calculating Coefficients for an RTD (Resistance Temperature Detector) Probe General Measurement Device and Calibration Topics 3
T Calculating Plating/Coating Weight for IMDS RoHS, REACH, ELV, IMDS and Restricted Substances 3
D Calculating Lower and Upper 3 Sigma Control Limits Statistical Analysis Tools, Techniques and SPC 2
M Calculating Adequate Receiving Inspection Sample Size Statistical Analysis Tools, Techniques and SPC 2
M Calculating Capability of Delivery Performance Capability, Accuracy and Stability - Processes, Machines, etc. 5
K Advice on Calculating Control Chart Control Limits Statistical Analysis Tools, Techniques and SPC 13
B Calculating Combined DPMO and Sigma Level for Two or More Different Work Areas Six Sigma 3
G Calculating the Fraction Defective Confidence Interval for a Lot Statistical Analysis Tools, Techniques and SPC 2
A Calculating Combined Measurement Uncertainty - VDA 5 Measurement Uncertainty (MU) 1
B Calculating Precision to Tolerance Ratio Gage R&R (GR&R) and MSA (Measurement Systems Analysis) 9
D DPPM - Calculating the Defect Rate of a Software Test Statistical Analysis Tools, Techniques and SPC 9
L FMEA - Calculating RPN = S*O*D FMEA and Control Plans 3
I Shewhart Constants vs Central Limit Theorem in calculating Control Limits Statistical Analysis Tools, Techniques and SPC 18
S Is a Stable Process (within Control Limits) required for Calculating Pp, Ppk? Capability, Accuracy and Stability - Processes, Machines, etc. 6
Calculating the IATF TS 16949 Recertification Audit Man Days - Clarification ISO 9000, ISO 9001, and ISO 9004 Quality Management Systems Standards 6
IEC60825 - Laser Safety - Calculating Maximum Permissible Exposure Correctly Other ISO and International Standards and European Regulations 1
P Calculating Process Capability from Data of attached spreadsheet Statistical Analysis Tools, Techniques and SPC 4