# Interesting DiscussionSample Size Determination for Medical Device

J

#### JoyG

We will perform design verification on medical device (portable tabletop) intended for diagnostic purposes. Looking for sample size rational to perform the functional/performance characteristic challenges for design verification purposes. Below is a statistical rational I'm proposing. Looking for feedback if this rational is justifiable. If not, what would appropriate statistical rational.
Is this sample size rational valid for reusable capital equipment?
Can I divide number of trials or samples into multiple devices?

"Using the Binomial distribution to calculate the sample size at 95% confidence the probability to detect no more than zero defect present out of 95% of the population would require at-least 59 number of trials or samples. Since the device is reusable capital equipment the number of trials will be divided into three devices where each device will perform 20 trials with total of 60 trials. This will discover any inherit risk present in the device. The device gets 100% functionally tested during manufacturing."

#### yodon

Super Moderator
What is the intended user / patient population? Can different users / patients (age, weight, sex, race, ...) have any effect? What kind of tests are these? Mechanical, performance, etc.? That might have a bearing. Are you saying that only 3 devices will be used, with each undergoing 20 tests?

Sorry I can't answer your question but maybe this will help stimulate discussion.

#### Bev D

##### Heretical Statistician
Super Moderator
Are you testing the laptop or your software on the laptop?

J

#### JoyG

What is the intended user / patient population? Can different users / patients (age, weight, sex, race, ...) have any effect? What kind of tests are these? Mechanical, performance, etc.? That might have a bearing. Are you saying that only 3 devices will be used, with each undergoing 20 tests?

Sorry I can't answer your question but maybe this will help stimulate discussion.

Hi Yodon, yes this would be 3 devices each undergoing 20 tests vs. producing 60 units to collect data from 1 trial with each device. Do you think it would be valid to perform multiple trials across 3 units to collect the 60 trials worth of data for design verification testing (mechanical and performance testing)?

#### Bev D

##### Heretical Statistician
Super Moderator
In my organization we manufacture diagnostic instruments so we a similar dilemma.
we calculate the required sample size as the number of runs necessary on a single instrument.
and because performance isn't homogenous across instruments we use run that sample size on multiple instruments.

Last edited:

#### Mark Meer

Trusted Information Resource
I'm not certain I'm following the rationale here. We're talking design verification, correct?

Suppose you have 60 requirements to verify and, for simplicity, have developed 60 corresponding verification tests. It seems as though the discussion is between either:
(a) Using a single device and do all 60 tests; or
(b) Using 3 devices, and dividing the tests among them (so each unit is subject to 20 tests).

It's unclear how you justify selecting one or the other. If you acknowledge that performance isn't homogeneous across devices, then shouldn't you be performing all 60 tests on multiple units?

#### Bev D

##### Heretical Statistician
Super Moderator
If the performance wasn’t homogenous across devices (and it often isn’t, so we always assume it isn’t until we demonstrate that it is) we would test all characteristics across multiple instruments with multiple runs. Often the number of runs will range into the 50-100 range or more depending on what we are looking for. For our 1 time use devices we will test up to 100 or more devices from 3 independent lots.

for characteristics that are sensitive to specific use conditions (protocol or specimen conditions) we can manage the sample size by directed testing: we test under worst case conditions to ensure that we have margin.

#### Mark Meer

Trusted Information Resource
If the performance wasn’t homogenous across devices (and it often isn’t, so we always assume it isn’t until we demonstrate that it is) we would test all characteristics across multiple instruments with multiple runs. Often the number of runs will range into the 50-100 range or more depending on what we are looking for. For our 1 time use devices we will test up to 100 or more devices from 3 independent lots.

Thank for the clarification Bev. But I'm afraid (as is often the case) I still don't think I'm grasping this - at least not how your approach can be applied in practice...

Again, we are talking design verification, correct?
If so, how is it that you have so many units, and multiple independent lots before the design has even been verified? I presume you're working with a product that can be easily (and inexpensively) produced in quantity?

#### Bev D

##### Heretical Statistician
Super Moderator
I’m not an expert in hearing aids (my only exposure is that my wife wears them and I’m in charge of replacing the batteries).

Usually these things are not about some statistical calculation of the sample size, although that is important. It’s about understanding what constitutes the ‘population’.

In design verify, a re-usable device such as a hearing aid would have a smaller sample size of the device than a single use device. Sample size will still depend on what characteristics you need to verify.

Although some characteristics are deterministic - either it works or it doesn’t - some variation will have to be included. For example battery fit. Batteries have a specified size range and your design has a specified size range. At a minimum I would expect that you would need to ensure that the minimum and maximum battery size fits in the minimum and maximum specified sizes of your hearing aids. So the sample size is determined by counting the conditions to be verified.

Another example: I’m sure you have requirements regarding the wear of the battery compartment. How many battery changes are you targeting? that and your confidence level as well as the margin you need for your requirement will tell you the sample size – how many insertions do you need? Of course you could go to directed testing around stress vs strength that would not require actual insertions until validation with ‘production’ level parts.

Some characteristics are subject to inherent variation from use conditions: for example, I can imagine that you need to prove that the hearing aid meets output requirements given a range of sound volume and frequency as well as ‘fit’ in the ear? In this case your sample size is more dependent on the range of each condition than anything else. The number of devices is less important than the number of conditions. Do you just test at the extremes and the nominal? Or across the range in a distribution of conditions that is representative of the conditions that exist in actual use? It is also dependent on whether or not the hearing aid is consistent in its performance across the range. In this case of design verify you can usually have a small number of devices tested across a full range of conditions. I typically would recommend 3 devices just to be safe. The ‘number of conditions tested on each device’ is then the real sample size.

#### Bev D

##### Heretical Statistician
Super Moderator
Thank for the clarification Bev. But I'm afraid (as is often the case) I still don't think I'm grasping this - at least not how your approach can be applied in practice...

Again, we are talking design verification, correct?
If so, how is it that you have so many units, and multiple independent lots before the design has even been verified? I presume you're working with a product that can be easily (and inexpensively) produced in quantity?

it might depend on what you use 'design verify' for. there are many active working definitions of this. In our case we do design verify with prototypes. One of our product types is a one time use device that is a diagnostic assay. we always demonstrate at the prototype stage that our design can meet our targeted sensitivity and specificity claims. this requires more than 100 blood samples to be tested in a matched pair fashion. Each sample must be tested on the gold standard and a prototype device. since each device is a single use, we use over hundred devices. we also build 3 prototype lots because that we know that there is a difference between lots; it is the nature of biologicals. this isn't what I call cheap and we must do it in design verify to verify that we are ready for to move to the validation stage when we submit to the USDA for licensure. We have the same approach for larger instruments that are multiple use. we do make a small number of them but we runs hundreds of runs on each...