Extremely Capable data but fails normality, non-normal not justifiable

Forbes82

Registered
This relates to Design Verification testing for a complex electromechanical medical device (only 5 test articles on hand, we have to rely on repeats to get to n=30). It does not relate to ongoing process control!

Our procedures ask for data normality testing prior to calculation of Ppk values (or equivalent tolerance interval testing) for the given % reliability and % confidence (which are driven by DFMEA risk levels). However in many cases the data is extraordinarily capable (e.g. Ppk would be 50 or more!) but due to the lack of variation within individual test articles it fails normality check. Transformation or fitting to another non-normal distribution isn't working either for the same reason.

Obviously I have no concern about the design's ability to meet the defined requirement, but at the minute I have to 'fail' the test.

QUESTION: is anyone aware of any guidance/standard which might say something like "for Ppk levels greater than 2.00, normality testing is not required"? Or to spin the math around another way, instead of our 95%/95% acceptance criteria, could we do tolerance interval testing for something like 99.9%/95% without the normality check?

I have also considered:
  • Attribute sampling - not practical where reliability is 99% as I need n=299 per Bayes success-run theorem.
  • Writing a justification that "we expect the data to be normally distributed, but due to special causes x/y/z the data in this case fails normality check. The data will be analysed as if it were normally distributed" - such subjectivity is not well received in medical device regulation.
  • Remove the requirement for normality - I know this isn't right, though I mention it as (interestingly) many medical device companies I've worked at don't require it!
NOTE: I'm looking for something similar to this example - for destructive Gage Studies (TMV) where 'identical' samples cannot be made (hence nested GR&R isn't workable), I refer to ASTM F3263-17 which just requires an observed Pp of >2.00 (includes part-to-part and test method variation) because "the test method can discriminate at least 1/12 of the tolerance and hence the resolution is adequate. Therefore, no additional analysis such as a Gage R&R Study is necessary."


I suspect I will be lectured (hi Bev) on 'misuse of Ppk' but (dare I say it) this is part of the game in medical device design verification, when dealing with low risk requirements which we have high confidence in meeting! We just need a good reference to appease the regulators!

Thanks
 
Elsmar Forum Sponsor
Lecture: Yeah, yeah. I know some Customers that are obsessed by the Normal distribution and that statistical abomination known as Ppk. Some fake/hack statisticians just can’t use the logic Mother Nature gave us. While there are references that say that the Normal distribution is a theoretical model that doesn’t exist in real life and references that concerning the futility of Ppk and the Normal distribution, I doubt these will be more than just an exercise in kicking the can down the road. (Read Wheeler - @Miner can probably cite the appropriate references. You can also see my “Essential References” paper and my “Statistical Alchemy paper. Then remove the requirement if it is an internal requirement.

Also figure out what the purpose of calculating Ppk is. The only “legitimate” reason is to predict the potential defect rate. But at high Ppk the idea that there are actual values beyond 4 or 5 sigma is just magical thinking and ignorance of real life. As I have said a toothpick factory will never produce a telephone pole even if several trillion toothpicks are made…

And many statistical reviewers in the pharma/medical device world know this is just pseudo statistics…

Sorry haven’t had my coffee yet!
 
Our procedures ask for data normality testing prior to calculation of Ppk values
If your procedure requires normality, there is no point in bringing up a reference, which states that for Cpk >= <some value> normality is not required. Hence, the main question is, if you are able to change your procedure?

only 5 test articles on hand, we have to rely on repeats to get to n=30
This sounds like you are using "repeated measurements" (also called subsampling). These are correlated, and must not be mistaken with "independent samples". Check your requirements.

it fails normality check
Are you required to use a specific test, e.g. the Anderson-Darling test? I hope you are free to select the test. If so, use a test, which does require your own judgment, and which does not provide a p-value. I recommend you are using a quantile-quantile plot and argue, that the points are "good enough".

Finally, there is a non-parametric formula for the Ppl-value, see ISO22514-4. It uses the quantiles instead of the standard deviation. This is recommended formula, if the dataset is non-normal.
 
This is a case of subgroups (test units or devices). They probably have 6 runs each? Depending on what the device does and what each ‘run’ is the runs may be independent within each device. The devices will be independent of each other. IF the Ppk value is calculated from all of the data within each device and there is no pooling of the data between devices then we still have a very capable device. The distribution is likely close to uniform given the small number of runs and the fact that the 5 devices were probably made at the same time with the same lots of components). If this is true, the OP could use the uniform distribution to calculate Ppk. I have the formulas. Although having done exactly this for the last 20 years even 5 devices and 6 runs per device is extraordinarily small for a Ppk calculation.

If the OP could provide more details as I’ve mentioned above we could provide much better help.

Although Ppk is still an abomination on the face of the earth. : ). Pi$$ing around with statistical gymnastics is a waste of time. I’d love to see the data.
 
Back
Top Bottom