# Ppk study, using pooled standard deviation or not?

Hi all,

I'm trying to perform a Ppk study. (= my employer wants a Ppk number)

But as I was busy with collecting the data and calculating the standard deviation. I came across this question my coworker asked me.
Why don't you use a pooled standard deviation?

Here is how the data that I'm working with is organised.
On one production line we have every month 3 times production of product X, 5 times product Y, 1 time product Z. (for example)

My take on the ppk study was to aggregate all data, calculate stdev (overall) and then the ppk.
My coworker says that when using a pooled stdev the stdev from the product that is produces the most has more of an impact on the overall ppk. And I follow him on this.

But my question remains, what is the best approach? aggregating all data or using a pooled stdev?

Thanks for any input.

Using a pooled standard deviation presupposes that you should be pooling the three different products together. What is the rationale for doing this? The typical approach would be to assess the process performance separately for each product. While managers may say "Give me one number!" that one number is often meaningless when it hides important information. Aggregate numbers often hide information and become less sensitive to changes in the process.

Lets say I have for one product 3 lots, another 5 lots and another 8 lots. All done on the same production line.
What would be the best approach to get a ppK for this production line?

- Standardize all the measurements USL / LSL / Average and aggregate all data, basically giving me 1 product 1 lot and calculate ppk
or
- Calculating ppk for each lot and product with a pooled stdev? And taking the average ppK (based on the products) ?

Definitely not the 1st option because that would ignore any setup-to-setup variations, which is definitely part of your long-term variation.
In this example, a pooled standard deviation would be the correct approach because you are evaluating the same product, and it does give more weight to larger subgroups. This is the default method used by Minitab. Their explanation is:
• Pooled standard deviation: The pooled standard deviation is the weighted average of subgroup variances, which gives larger subgroups more influence on the overall estimate. This method provides the most precise estimate of standard deviation when the process is in control.

Thanks again, its getting clearer for me. Unfortunately, we all know managers that want just 1 result
Can I ask what you mean precisely with:
Definitely not the 1st option because that would ignore any setup-to-setup variations, which is definitely part of your long-term variation

Lets say I have for one product 3 lots, another 5 lots and another 8 lots. All done on the same production line.
I may have misunderstood you, but I interpreted this as meaning that you produced 3 lots of one product during one production run, then another 5 lots of the same product during a different production run followed by another 8 lots of the same product during a third production run. Each production run would involve a setup, which would all be slightly different from each other. This would be considered setup-to-setup variation.

After a second, more careful reading, I realize that your statement could be interpreted two different ways. Are you still combining three different products? If so, my original comment still stands. You typically should not combine different products.

I can only think of one situation where you could standardize and combine different products. When I was in automotive, we made extruded weatherstripping. We had product families that consisted of the exact same extruded cross-section run on the same extrusion line under the same conditions but cut to several different lengths. In this situation, you could combine the length independent characteristics into one, and you could standardize the lengths by subtracting the target length then combining. However, you should be very careful about taking this approach because this is not a common situation.

Just make up a number to tell your manager - it’s easier and just as insightful. I may be a bit sarcastic but I’m not kidding. No statistical analysis should be undertaken unless the manipulator of the mathematical formula actually understands (1) the math and (2) how process variation actually works. No offense but you obviously don’t understand either of these two points. You are to be commended for coming here and asking questions but it is essential to truly understand this stuff. No mathematical formula is a substitute for knowledge or thinking.

Ppk is supposed to be an index representing the true and actual variation of all individual parts. Sure larger subgroups tend to ‘skew’ the statistical answer but that is truth. Actual parts don’t care about statistical theory or manipulations. Pooling standard deviations form subgroup to subgroup only works when the process is actually homogenous. Set-up to set-up variation (if it exists) contributes to non-homogeneity.

You should also plot your data in time series FIRST to understand the process variation. Using control limits will help you see if any non-homogeneity exists.

Unless of course you do not really care about gaining insight regarding the process, in which case we are back to my original advice. Just make a number up .

OK now that I got that rant out of my system, try reading these three articles. They will help with the whole homogeneity thing and they are free.
Thanks Bev D for your post. I've been lurking quite a while here on the forum and trying to learn as much as I can. I don't have much experience in SPC, but I want to learn. We have a software solution that outputs a Ppk value that we should report to our manager. But I started to question how it all works in the background. I tried calculating everything on my own. That's how I ended up in this forum. The articels from D. Wheeler are very insightful, thanks for sharing.
So to recap, if i have 1 test (lets say thickness of potato slices) on 1 production line with multiple products (and multiple specs) each with multiple lots, pooling everything together in order to get 1 Ppk value for the test, is only (somewhat) interesting if the data is homogeneous. I should plot all the data in a time series to check proces variation.