# Minitab and Excel Giving Different Cpk Values for the same Data

V

#### vikashi

Dear Friends,
I have an issue related with CpK calculation. When I am doing Cpk Calculation in Excel using (RBar/d2) getting different data than MINITAB (While using the Same standard deviation type i.e. RBar/d2).

Can you please help me, what should I do to have the same Cpk value to validate the same.

------------------

I am not able to upload the file, some error in the forum site.
Below is the data set.

What I feel that These N/A Values are giving some issues while calculation in MINITAB.

Thanks a lot!
Vikash

Last edited by a moderator:
V

#### vikashi

Re: Help: Minitab and Excel Giving Different Cpk Values for the same Data

Now Attached the Excel sheet!

#### Attachments

• Book1.xlsx
10.4 KB · Views: 596
• Untitled.png
17.8 KB · Views: 2,855
A

Re: Help: Minitab and Excel Giving Different Cpk Values for the same Data

I have looked at your data and have a couple of observations.
Sample #2 in Subgroup #23, 57.2478, is an outlier from the other measurements.
Cpk-lower is about 0.93 and Cpk-upper is somewhere around 1.2 to 1.4.
The range on Cpk-upper is due to the parts larger than 57.2380 appear to be from a slightly improved process which differs from the rest of the samples.
Cp will just be over 1.00 and, therefore, unacceptable in the long term.

I will let the SPC guru’s explain your math problems, but most stems from the ’N/A’ data that changes the d2 value and the process of sub-grouping.

V

#### vikashi

Re: Help: Minitab and Excel Giving Different Cpk Values for the same Data

I am aware of those facts related with process point of view. My query is related with the Cpk calculation method being used in Excel vis a vis Minitab.

Regards,
Vikash

I have looked at your data and have a couple of observations.
Sample #2 in Subgroup #23, 57.2478, is an outlier from the other measurements.
Cpk-lower is about 0.93 and Cpk-upper is somewhere around 1.2 to 1.4.
The range on Cpk-upper is due to the parts larger than 57.2380 appear to be from a slightly improved process which differs from the rest of the samples.
Cp will just be over 1.00 and, therefore, unacceptable in the long term.

I will let the SPC guru’s explain your math problems, but most stems from the ’N/A’ data that changes the d2 value and the process of sub-grouping.

A

From farther looking - if you can understand the 'improved' process in your manufacturing of this dimension, you could get a Cp of a little over 2.0

A

#### Allattar

I think the difference in the standard deviation estimate from the two methods is due to the subgroup size not being constant. Where you have missing values your subgroup size becomes 4. d2 for that subgroup is not 2.326 but 2.059.

When subgroup sizes are unequal Minitab finds the estimate of sigma from a summation of ranges with respect to their d2 value, for that subgroup size.

The excel worksheet is assuming all subgroups are size 5, which they aren't.

In Minitab go to the capability analysis dialogue box, then click help, from help find the see also button and go to methods and formulas. This section will show the formula thats usde in Minitab for finding the estimates of standard deviation.

V

#### vikashi

Thanks...I will check at my end...

I think the difference in the standard deviation estimate from the two methods is due to the subgroup size not being constant. Where you have missing values your subgroup size becomes 4. d2 for that subgroup is not 2.326 but 2.059.

When subgroup sizes are unequal Minitab finds the estimate of sigma from a summation of ranges with respect to their d2 value, for that subgroup size.

The excel worksheet is assuming all subgroups are size 5, which they aren't.

In Minitab go to the capability analysis dialogue box, then click help, from help find the see also button and go to methods and formulas. This section will show the formula thats usde in Minitab for finding the estimates of standard deviation.

A

I plotted the 'run chart' data and found the data to be cyclic. The process is therefore 'not stable'. There appears to be 2 processes fighting each other.
The calculations which you are trying to understand are based upon a stable process (with a known distribution). Since the process under investigation is not stable, you, unfortunately, have the condition of 'garbage in - garbage out'.

You need to examine the data that you do have differently to help you understand the underlying processes.

A

#### Allattar

Im not seeing that.

Run chart with subgroups across rows.
P-value clusters - .155
P-value mixtures- .845
P-Value Trends- .256
P-Value Oscillation- .744

On an x-bar chart using standard rules, test 5, 1, and 6 are broken at places. Not a huge effect but not really cyclical evidence.

You are right it isn't stable, but I cannot see two processes there.

A

I will try to explain what I am ‘seeing’ in the data for all who are reading this thread and following.

(But first, an aside: I was trained by one of the 13 founders of ASQ, Art Bender. He knew Deming and Juran. Young engineers would take their data to him and ask for help in solving quality problems. He always said, “Those are nice numbers, but there is nothing I can do. Come back when you can show me the data.” Art knew if one could ‘show’ the data, i.e. graph it, they could ‘picture’ what was going on.
So, to the people in this world who are trying to understand a process, please turn off the calculation crank until you have an idea of what you are looking for. Only then let the math assist you in your quest.
Art did this thousands of times for the engineers and practitioners at Delco Remy. I have also done this thousands of times. You are definitely not alone, you have many good fellow quality associates with similar issues. I am always learning from this forum and other sources.)

See my attached worksheet concerning my responses.

Missing data makes the calculations slightly harder in Excel, but that can be overcome with some futzing.

You attached a ‘Process Capability’ or normal distribution chart with your second posting. It displayed (showed) data near the LSL and the USL yet it calculated Cpk’s of 1.5 an 2.0. That just CANNOT happen in reality. I have even seen people argue that they had a Cpk of 1.33 even though they had data points outside the spec limits. That is what math allows. So, either the math is incorrect, which is the question you originally asked, or the data is not what would be expected for the math. I probably did not answer clear enough that the math seemed to be fine with the exception of the missing data in the Excel calculations. To avoid this issue I also looked at subsets that only had complete data as well as all of the subsets.

On the second tab or “Pseudo-Run Chart” worksheet, I rearranged the data to create a ‘Pseudo’ run chart. I call it ‘Pseudo” because I do not if the samples selected, 1-5, were sequentially, every other one, or some other sequence. I also do not know the number of parts between the subgroups. The data is acceptable for a control chart but may distort a run chart. When you view the run chart you can see a pattern in the data. I accentuated this pattern in the second chart with lines. It reminds me of boring processes where the tool is reset and wears quickly. Since there appears to be several ‘resets’ and then a steady decline in the measured value, I would state that there are (at least) two processes being mixed into the final outcome. In both graphs I ignored the out-of control point. In any case, the process is out-of control or not stable. ‘Look’ at the data that you have already collected. Farther math cranking will be incorrect – garbage in: garbage out.

On the third tab or “X-bar-R_Charts” worksheet, I plotted an X-bar and a R chart for the subgroups which contained 5 samples. I did this to keep the D4 value constant. First you can see that X-bar values below the X-barbar line tend to hug the X-barbar line. (This may just be influenced viewing.) But on the Range chart there are definitely three peaks and all of the other range values appear to be getting smaller. Seems to be a pattern and, therefore, another indicator of instability with (at least) two processes being mixed into the final outcome. Why does the range within the subgroups tend to get smaller the longer the process runs?

The fourth tab or “Normal Probability Chart” worksheet is my favorite view of data. Normal probability charts plot the measured values against the Z-score (or sigma) values. Your attached a ‘Process Capability’ or normal distribution chart is on the top of the worksheet. It plots the measured values (in cell groupings) against the data accumulative percentage, (accumulative percentages above 50% are plotted as accumulative percentage – 50%) This is a Gaussian curve. Data should look like the Gaussian or ‘bell’ curve if it is normally distributed. Unfortunately, it is difficult for our eyes to discern this condition on this chart. Another way is to plot the measured values against the Z-score transform of the accumulative percentage value. On this chart normal data will appear as a straight line that one’s eyes can easily discern.
On the third chart where the outlier is ignored, you can see two distinct straight lines. This implies at least 2 process are involved. One has significantly improved capability and is tied somehow to the larger parts being produced (the ‘reset’ process?).

Hope this helps.

Joe

#### Attachments

• Cpk.xls
173 KB · Views: 676