Stress / Challenge Conditions for Design Verification Testing to Reduce Sample Size

cdmdux

Registered
Recently my R&D engineers and I have been having a dispute about sample sizes. They have fallen in love with a book that they have used at a previous company, Wayne Taylor's Statistical Procedures for the Medical Device Industry. I have no problem with the book, but they've latched on to one component discussed in Appendix A of STAT-03, which is the "Stress Condition."

According to Dr. Wayne, the "Stress test is expected to produce more failures than will occur under normal conditions." He says the stress test rate must be 5x the nominal failure rate, and conveniently mentions Ship testing per ASTMs as an established stress condition. This allows for a reduced sample size, from the nominal 95/95 we would have (59 samples attribute) to 90/90 (22 samples) based on being able to create and detect failures at a higher rate.

I have personally never heard of reducing a sample size because you are using an established ship method that is "worst-case" but who am I to disagree with a statistician. I just can't get over all my experience at previous companies using at least 95/95 for product sterility.

Anyone have experience with this method? Does it fly with FDA/NBs?
 

Miner

Forum Moderator
Leader
Admin
I cannot answer your question from the Medical Device perspective, but can offer some thoughts from a reliability perspective.
  • Higher stress conditions will reduce the time required for wear out failures to occur. The same number of wear out failures would eventually occur, but you are seeing them in a shorter time span, so you could interpret this as more failures.
  • Higher stress conditions may create or increase an overlap of the stress/strength distributions resulting in more failures. However, this is usually done to identify the operating and destruct limits of a device.
  • Too much of an increase in stress conditions may result in failure modes that you would never see under normal conditions.
  • I am not familiar with the 95/95, 90/90 scheme, but in reliability, sample sizes are driven by the required reliability, the desired confidence level, and the time constraints of the test, not by the stress level.
 

Bev D

Heretical Statistician
Leader
Super Moderator
I use worst case conditions a lot to reduce the sample size - particularly when the current failure rate is low (<1%). The whole idea is based on the stress strength interaction. In typical sampling we take a random sample which means in order to see 'weak' parts fail under 'nominal use' conditions we have to amp up the sample size to see a low failure rate. also the common sample size formulas are based on one of two things: trying to characterize the failure rate or determining if the failure rate will be at or less than some stated level.

But if think about the physics and we take the 'weakest' parts allowed by design/process tolerances and we subject them to the worst case stress/use conditions (not foolish as Miner alluded to. (You must verify validate they are worst case and not foolish) and there are no failures then we have succeeded! and we don't need a lot of samples because now we are in deterministic and not probabilistic world. (like dropping your pen you don't need a statistical sample size to know that it better fall towards the earth. if not you have a problem) In other worlds this is known as margin testing. If we do get failures we can estimate the overall failure rate by knowing how frequently the worst case conditions occur (what is the distribution of stress) and how frequently weak parts will occur (what is the distribution of parts). This isn't always possible nor is it easy. the best way is to use the results of your validation testing to improve the strength of your parts (increase the margin)

like most things Dr. Taylor's book makes this sound rather easy. like almost all quality concepts - indeed all of physics - the "cliff's notes version makes it all sound so very easy.

you can use this approach in the medical device industry. for most things. it all depends on how you present it to your reviewers and statistical reviewers at your regulatory agency.
 

cdmdux

Registered
I use worst case conditions a lot to reduce the sample size - particularly when the current failure rate is low (<1%). The whole idea is based on the stress strength interaction. In typical sampling we take a random sample which means in order to see 'weak' parts fail under 'nominal use' conditions we have to amp up the sample size to see a low failure rate. also the common sample size formulas are based on one of two things: trying to characterize the failure rate or determining if the failure rate will be at or less than some stated level.

But if think about the physics and we take the 'weakest' parts allowed by design/process tolerances and we subject them to the worst case stress/use conditions (not foolish as Miner alluded to. (You must verify validate they are worst case and not foolish) and there are no failures then we have succeeded! and we don't need a lot of samples because now we are in deterministic and not probabilistic world. (like dropping your pen you don't need a statistical sample size to know that it better fall towards the earth. if not you have a problem) In other worlds this is known as margin testing. If we do get failures we can estimate the overall failure rate by knowing how frequently the worst case conditions occur (what is the distribution of stress) and how frequently weak parts will occur (what is the distribution of parts). This isn't always possible nor is it easy. the best way is to use the results of your validation testing to improve the strength of your parts (increase the margin)

like most things Dr. Taylor's book makes this sound rather easy. like almost all quality concepts - indeed all of physics - the "cliff's notes version makes it all sound so very easy.

you can use this approach in the medical device industry. for most things. it all depends on how you present it to your reviewers and statistical reviewers at your regulatory agency.

Thanks Bev, I appreciate the input. I think that it makes sense from a physics perspective, but I'm not sure it makes sense broadly applied like has been proposed by the R&D group. For example, on our sterile barrier package ship testing, I do not think it makes sense to reduce sample size because we are following the ASTM standard, where the drops are X% higher than we would normally expect or the temperature cycles are extreme and conducted within a matter of hours rather than weeks, I do not think it gives carte blanche to reduce sample sizes.

What kind of data would you want to see in this case to reduce the sample size?
 

Steve Prevette

Deming Disciple
Leader
Super Moderator
What kind of data would you want to see in this case to reduce the sample size?

The original article you quoted referred to " stress test rate must be 5x the nominal failure rate "

So apparently you would want to see that in a "normal" test, the failure rate is 1/5th of the "stress test rate"

It would be a good idea to run a good sized "normal" text in parallel with the stress test rate in order to see that the failure modes are similar and to verify the 5x difference.

By the way, the 95/95 and 90/90 rates are very common (59 and 22) and relate to go-no go tests. If I test 59 items and none of them fail, then I am 95% confident that no more than 5% are bad. Similarly if I were to test 22, with no failures, then I am 90% confident no more than 10% are bad.

But we may be talking apples and oranges if you are referring to time to failure ("Higher stress conditions will reduce the time required for wear out failures to occur"). Perhaps I could argue that I don't need to run a test as long if running a stress test (a common application) but it may be a stretch to argue for a smaller sample size. There is also the discussion needed of - are we looking for "weak" parts that are significantly different than other parts, or are all parts basically equal with an exponential time to failure driven by randomness.

And I don't know about STAT-03, but here is a link to STAT-04. And the discussion is more about design margin, rather than acceptance sampling (are these parts good).

STAT-04: Statistical Techniques for Design Verification (linkedin.com)

Quoting

  • Worst-case testing allowing 1-5 units to be tested at each worst-case setting. Worst-case conditions are the settings for the design outputs that cause the worst-case performance of the design inputs. When worst-case conditions can be identified and units can be precisely built at or modified to these worst-case conditions, a single unit may be tested at each of the worst-case conditions. This ensures the design functions over the entire specification range. This approach is generally preferable to testing a larger number of units toward the middle of the specification range. When units cannot be precisely built at or modified to these worst-case conditions, multiple units may have to be built and tested.
Strategies for Reducing the Sample Size - For when a sampling plan is used
  • Variables data - having a measurement instead of attribute pass/fail results, allows variable sampling plans to be used. They require as few as 15 samples in contrast to the minimum of 299 above.
  • Stress testing – testing a small number of units using a method that induces more failures than expected in the field. Note the stress test column in the table above. This can include design margin as described in Appendix A of STAT-03.
  • Multiple tests on the same unit. A sample size is 30 means 30 tests are required. Under certain circumstance if may be possible to test 3 units 10 times each.
 

Bev D

Heretical Statistician
Leader
Super Moderator
Thanks Bev, I appreciate the input. I think that it makes sense from a physics perspective, but I'm not sure it makes sense broadly applied like has been proposed by the R&D group. For example, on our sterile barrier package ship testing, I do not think it makes sense to reduce sample size because we are following the ASTM standard, where the drops are X% higher than we would normally expect or the temperature cycles are extreme and conducted within a matter of hours rather than weeks, I do not think it gives carte blanche to reduce sample sizes.

What kind of data would you want to see in this case to reduce the sample size?
The ASTM standard already dictates high stresses. packaging typically has very little (relative) allowable variation by design so it may be difficult or even irrelevant to try to get the weakest allowable packaging - with exception perhaps of the maximum time your cooling devices (eg ice) [strength] is exposed to ambient conditions or how long the package may be in shipping in hot environments - sitting on the tarmac an/or loading doc (stress).

in these cases we do reduce sample sizes to 1-3 packages. of course our regulatory body recognizes the physics justification and has no problem with it. yours might be different
 

Tidge

Trusted Information Resource
I am not familiar with Wayne Taylor's book, but it occurs to me that in in the absence of a recognized determination of what "stress rate greater than 5x the nominal failure rate" means, the development team may have to 'burn' through many more samples to be able to determine/establish such a rate with any precision/confidence than they would otherwise consume by simply by using the larger sample size. The development team then has to do all the hard work involved with establishing those new limits. My professional experience has been that unless a group has an unusually high level of commitment to a particular product (or material type) they will not get a positive return on their investment for developing rigorous test methodologies and well-established test limits.

It is entirely possible that an industry group or standards organization has done such work to have established such stress rates for your product of interest... for example, non-medical device industries have established (many) ASTM standards that allow for (what might appear to be ridiculously) small sample sizes based on attribute sampling through rigorous testing and physical analysis.
 

Bev D

Heretical Statistician
Leader
Super Moderator
As someone who does this a lot, it really isn't that difficult. It's just really good development to requirements. and maybe development of really good requirements. most companies are designing the same thing over and over again. let's take cars. certainly all engineers know how to select components (screws and bolts) and processes (welding) and metals that have 1) statements of their 'ratings' or abilities and 2) they certainly know what their design has to survive. I mean crashes, cold temperatures, hot temperatures number of times we open and shut a door, etc. It doesn't take all that much work to understand the worst case stress that your product will experience. good designers will design to survive that stress. then it's a matter of testing at that stress.
 
Take a look at this post as well: Unrealistic Packaging Validation Sample Size

Your sample size needs to be based on risk, meaning, if the harm associated with a non-sterile device has a high severity, your sample size should achieve high confidence and reliability. My guess is that you don't know what your nominal failure rate is, so how can you justify a reduced sample size if the reduction is based on a 5x calculation. Typically, when it comes to shipping tests, FDA expects to see 29 samples that were tested per ASTM 4169 and results show no failures (no breach of sterile barrier).
 

Bev D

Heretical Statistician
Leader
Super Moderator
There is no physical or theoretical basis for the 5X statement...it’s a ‘rule of thumb’...
 
Top Bottom