Process Capability for parameters with non-normal distribution.

v9991

Trusted Information Resource
we have a parameter with a limits of "not more than 2%" criteria;
* 95% of batches have values <=0.2%
* approximately 3% have values < 0.5%,and others are distributed.
* other 1% have other values.

Request help.,(i am using minitab)
==> we are required to calculate the cp-cpk values for same.
==> (even to use capability sixpack/analysis for non-normal distribution )we are unable to find a suitable fittment for any distribution ;
 

Miner

Forum Moderator
Leader
Admin
First, verify that you do not have a mixture of different process streams. Next, investigate whether the process is stable (in control). Many times, these are the cause of the issue. If the process is homogeneous and stable, you should have some type of recognizable distribution.

Last resort is Minitab's Nonparametric Capability Analysis macro.
 

Bev D

Heretical Statistician
Leader
Super Moderator
what is the parameter and how is it measured?

Miner's advice on checking for stability (actually homogeneity) is essential.
 

v9991

Trusted Information Resource
what is the parameter and how is it measured?
I am referring to a pharmaceutical product, with batch-manufacturing process; and parameter such as moisture-content/solvent-content; this is sampled typically at end of process, using a karl fischer titrator / gas chromatography respectively.

First, verify that you do not have a mixture of different process streams. Next, investigate whether the process is stable (in control).

agreed, we often do not find specific reasons (at-least immediately);
IF & WHEN we find a correlation to process/material attributes; can we exclude those data points to re-assess the distribution & capability (considering that same have been addressed duly!)
 

Miner

Forum Moderator
Leader
Admin
Any chance you can provide data in time sequence? You can strip out anything confidential provided the time, data and subgrouping is clear.

I would speculate that you have a lot of batch to batch variation as the primary source of variability. Have you performed any MSA studies on the test itself?

If you make a specific (and permanent) change, it is perfectly acceptable to collect new data after the change and analyze it for capability, excluding data from prior to the change.
 

v9991

Trusted Information Resource
Thank you Miner,

Here's the file we are trying to work on,

Further, we haven't done MSA, but methods are validated, (RSD ~ 10%; specification of 2%, median-min-max ::-0.2 0.09 0.33 )
 

Attachments

  • distribution.xlsx
    13.5 KB · Views: 255

Miner

Forum Moderator
Leader
Admin
Your process has a lot going on. The basic problem that you are having is caused by a very unstable process, Therefore, you cannot fit a distribution or assess the capability.

Look at the attached graph. The process shown in the upper left is out of control and appears to have experienced a large shift downward at observation 248 (not counting missing observations). After splitting the data after observation 248, the process shown in the upper right still exhibits smaller shifts and occasional extreme values.

The bottom two graphs show that the process tends toward normality, if it were to be stabilized. However, you may have to judge this visually rather than using a normality test. Note the probability plot in the lower right. Even though the data for Stage 1 fall in a straight line and visually would pass the "fat pencil" test for normality, the p-value is still quite small. This is due to the chunkiness of the data, which is due to the resolution of the measurement system.
 

Attachments

  • Batch Process.jpg
    Batch Process.jpg
    49.3 KB · Views: 349

bobdoering

Stop X-bar/R Madness!!
Trusted Information Resource
I have attached two approaches - assuming the data is homogeneous. I did not look at the details of the process as Miner did. As to which is preferred, I recommend reading my blog Unilateral Tolerance Capability Calculation . I do NOT recommend reporting Ppk or Cpk, as you do not have a process where the target is in the center. The point of Ppk and Cpk is to determining how centered between the specifications the process is. I assume your target is zero. Cp or Pp would be far more appropriate. They simply illustrate how well your process variation (when represented by the correct model, or distribution) fits within the tolerance. The expected variation should be non-normal if it is close to the physical limit of zero. If the actual distribution is normal, then you need to evaluate the data's location to the influence of the physical limit. The further away it is, the more normal it may be. The other point is you are using so little of your tolerance the effort to further analyze the process variation needs to be balanced with its value added (or value maintained, if it can be established). Measurement error can mask process variation with its normal distribution, also. The non-normal distributions have a very high p-value, especially compared to the normal distribution - so I would be surprised if the true underlying process distribution was normal. However, it would be good to understand whether shifts that Miner has observed are expected long term for this process (common cause), or if the process can have the shifts eliminated (special cause) through better controls. Again, the controls need to balance with the economic need based on how little of the tolerance us actually being used.
 

Attachments

  • Distribution calculated with a 0 bound.docx
    272 KB · Views: 157
Last edited:

v9991

Trusted Information Resource
First a BIG :thanks::thanks: for for taking time to go through the data; and diagnosing the situation.

Your process has a lot going on. The basic problem that you are having is caused by a very unstable process, Therefore, you cannot fit a distribution or assess the capability.

Look at the attached graph. The process shown in the upper left is out of control and appears to have experienced a large shift downward at observation 248 (not counting missing observations). After splitting the data after observation 248, the process shown in the upper right still exhibits smaller shifts and occasional extreme values.

The bottom two graphs show that the process tends toward normality, if it were to be stabilized. However, you may have to judge this visually rather than using a normality test. Note the probability plot in the lower right. Even though the data for Stage 1 fall in a straight line and visually would pass the "fat pencil" test for normality, the p-value is still quite small. This is due to the chunkiness of the data, which is due to the resolution of the measurement system.
@ Miner, YES, its counter intutive to start 'assess distribution / capability of unstable process' BUT to explain the situation is that
a) we are starting to use the tools of SPC/SQC and this is the first step.
b) we have these preceding/pre-requisite steps defined, (control charts, and assessment)
c) one of the step was to arrive at a baseline of cpk/ppk steps.
now, given the background, and based on your comment, is it reasonable to assess the (nearest) distribution based on visual/techno_functional assessment and evaluate the cpk/ppk ?




I have attached two approaches - assuming the data is homogeneous. I did not look at the details of the process as Miner did.
As to which is preferred, I recommend reading my blog Unilateral Tolerance Capability Calculation .

I do NOT recommend reporting Ppk or Cpk, as you do not have a process where the target is in the center. The point of Ppk and Cpk is to determining how centered between the specifications the process is. I assume your target is zero.

Cp or Pp would be far more appropriate. They simply illustrate how well your process variation (when represented by the correct model, or distribution) fits within the tolerance.

The expected variation should be non-normal if it is close to the physical limit of zero. If the actual distribution is normal, then you need to evaluate the data's location to the influence of the physical limit.

The further away it is, the more normal it may be. The other point is you are using so little of your tolerance the effort to further analyze the process variation needs to be balanced with its value added (or value maintained, if it can be established).

Measurement error can mask process variation with its normal distribution, also. The non-normal distributions have a very high p-value, especially compared to the normal distribution - so I would be surprised if the true underlying process distribution was normal.

However, it would be good to understand whether shifts that Miner has observed are expected long term for this process (common cause), or if the process can have the shifts eliminated (special cause) through better controls. Again, the controls need to balance with the economic need based on how little of the tolerance us actually being used
.

@Bobdoering
wrt centered target==> the specificatinolimtis is having criteria of "not more than"; hence we would prefer to have the limits as near to 'zero' as possible.
wrt target ==> yes the target is zero.
wrt cpk/ppk ==> do you recommend that cp/pp to be adequate.?


@ reality...
above set of data/process is best case scenario, i.e., considerable amount of data;
* on other hand there are other products with 10-20 lots... and above scenario becomes tough to deal with. (wrt having pp/cp)
* what is the best approach for this scenario of 10-20 lots ?; (lets say we are interested/insist for looking to pp/cp values!)

kindly confirm and thanking you in advance for the guidance.
 
Last edited:

bobdoering

Stop X-bar/R Madness!!
Trusted Information Resource
@Bobdoering
wrt centered target==> the specificatinolimtis is having criteria of "not more than"; hence we would prefer to have the limits as near to 'zero' as possible.
wrt target ==> yes the target is zero.
wrt cpk/ppk ==> do you recommend that cp/pp to be adequate.?

Yes, the Pp of 2.32 is more than adequate to show capability.


* what is the best approach for this scenario of 10-20 lots ?; (lets say we are interested/insist for looking to pp/cp values!)

If you are trying to show process capability over all of the lots (homogeneous process), sampling the lots for a process like this should be appropriate. Combine all samples into your distribution and analyze capability together.

Remember, capability indices are ballpark figures. There is no way one number can reliably illustrate the variation you can expect over the life of the process. That being said, don't spend too much time nitpicking the details of a capability index. It simply isn't that powerful.
 
Top Bottom