Sample Size justification for design verification testing

d_addams · Jul 18, 2025

my $0.02 is that there is no 'correct' sample size, but the assurance level you select (which can then be used to set a sample size whether you use attribute or variables measures) must be risk based. So if you didn't establish that the assurance levels (i.e. sample sizes) you use are laid out in your QMS policy are risk based, you'll never get the thumbs up from the regulator. Another way to make this point is you could have tested a million parts and they FDA would still reject you if you don't demonstrate the assurance level necessary to pass was a risk-based decision.

A risk based decision means you've categorized the risk associated with non-compliance and then used that risk categorization as an input to your decision. This is normally reflected in a policy where the risks are categorized (typically in the dimensions of occurrence and severity) and then based on those categories a requisite assurance level is established.

If people want to demonstrate true understanding of this aspect of risk management they need to start using the words 'assurance level' where they typically use 'sample size', because what you really need is an assurance level, sample size can only be discussed once you establish a method of analysis, its can't be established before the method is defined. Your QMS should be method agnostic and thus should dictate assurance level not sample size.

Bev D · Jul 18, 2025

This is also true…”sample size” is far more complicated than plugging numbers into a formula. Too many people just want a single clear bright line yes or no answer. Thinking is not yet common. I used to tell my students that while there are many correct sample sizes there are far more incorrect and wrong sample sizes

Tidge · Jul 18, 2025

jjacks said:
we have a reusable medical device which needs use life testing after high level disinfection. The sample size is the number of cycles, correct? also how do we determine sample size for end of life testing. Considering using the same sample size we used for biocomp testing.

There are a couple of different things wrapped up in this question. I'm going to ignore the disinfection part.

For the reusable part of the question, the number of cycles isn't "sample size", it is probably better to think of number of cycles as something often called "parameter" or "characteristic".

As noted by @d_addams , you will have to use an analysis (based on risk, certainly... but also business performance, keep reading) to verify that your design will still work after N sterilization cycles. "Still work" is the (binomial) attribute, N sterilization cycles is the parameter. You will want a study design that will establish with some Likelihood and Confidence level that after N cycles, the device will still work.

Straightforward math gets the sample sizes for different values of Likelihood and Confidence. Personally: If the number of anticipated sterilization cycles is large, I would think that the clinical risk assessment for the device not working after N (>> 1) cycles is relatively low (I am imagining some sort performance issue that is trivial to detect), such that failures after N cycles represent a business warranty issue and not a safety issue... which is a long way for me to get to something like a confidence level of only about 90%... so that a 99% Likelihood / 90% Confidence level would require no failures from a minimum sample size of 44 (different, representative) devices. (or only 1 out of 64, or whatever...)

Personally: If the target number of sterilization cycles is under 15... I'd construct the study design to repeat functionality after each cycle and I wouldn't STOP the study until I've completed TARGET+2 cycles. (of course this can be a lot of testing, so if I was cost-limited I'd probably go with a lower confidence (maybe only 99% Likelihood and 80% confidence to get the sample size down to 0 failures from 21). I'm slyly trying to collecting something like 'variable' data in case the design is prone to failure. If the design is robust, going past the TARGET is generating some extra evidence that you have achieved the TARGET, and you can leverage the 'excess' data.

Sample Size justification for design verification testing

d_addams

Quite Involved in Discussions

Bev D

Heretical Statistician

Tidge

Similar threads