Interesting Discussion Sample Size Determination for Medical Device

david316

Involved In Discussions
Some characteristics are subject to inherent variation from use conditions: for example, I can imagine that you need to prove that the hearing aid meets output requirements given a range of sound volume and frequency as well as ‘fit’ in the ear? In this case your sample size is more dependent on the range of each condition than anything else. The number of devices is less important than the number of conditions. Do you just test at the extremes and the nominal? Or across the range in a distribution of conditions that is representative of the conditions that exist in actual use? It is also dependent on whether or not the hearing aid is consistent in its performance across the range. In this case of design verify you can usually have a small number of devices tested across a full range of conditions. I typically would recommend 3 devices just to be safe. The ‘number of conditions tested on each device’ is then the real sample size.

If you only test 3 devices but over a range of conditions...how do you justify that the results from the 3 devices will generalize to all devices produced? i.e. manufactured devices will meet there specification over the conditions with a certain confidence and reliability?

Thanks
 

Bev D

Heretical Statistician
Leader
Super Moderator
If you only test 3 devices but over a range of conditions...how do you justify that the results from the 3 devices will generalize to all devices produced? i.e. manufactured devices will meet there specification over the conditions with a certain confidence and reliability?

Thanks
The short answer is conditional probability.

But there a few things to clarify. First when I say 3 units, it's 3 units at a min or max of the designed tolerances run or used at the worst case use conditions. So there may actually be more than simply 3 units depending on the complexity of the thing and the nature of the use.

I always think about this in terms of Development, OQ and PQ validation. In development we should be screening experiments, full factorials and response surface experiments to determine the critical (input) characteristics and what tolerances are required on those critical characteristics to guarantee that the output meets the requirements. If we utilize physics and good industrial experimentation we can design product that will have no or very few defects. OQ is used to confirm (or validate) that at the worst case tolerance levels and under worst case use conditions we have no or very few defects/failures. If you can meet requirements at the worst case conditions the nominal will certainly meet requirements. This is just physics. of course if you have defects at the worst case conditions you can use conditional probability to determine the actual failure rate. PQ is typically run at nominal. this is where you can and should use the standard sample size calculations as you can are creating things at manufacturing volumes.

I have found that the use of sample size calculations is needed when we haven't done a good job of design. The downside of these sample sizes is that they REQUIRE a uniformly distributed population from which to sample such that the sample is representative of the population. Of course if we haven't done a good job in development our knowledge of the physics is meager and our knowledge of the nature of the distribution of the population is also meager.

All of that said I know that many regulatory groups cling to the old fashioned random statistics and there may be little one can do to change their minds.
 

david316

Involved In Discussions
It can get complicated quickly. If we consider the hearing aid example as you alluded to there may be a number of conditions over which the hear aid needs to fulfil it requirements. i.e. " I can imagine that you need to prove that the hearing aid meets output requirements given a range of sound volume and frequency as well as ‘fit’ in the ear? "

It assume it is very difficult to make a "worst case" hearing aid and you still need to prove the hearing aid meets output requirements over a range of volume setting etc. If environmental conditions affects the devices performance this further compounds the complexity of the problem. In the end the fall back is often to use random statistics but even this falls apart quite quickly because the combinations of tests ends up resulting in large sample sizes which are not manageable. In the end statistical compromises are made.....
 

Bev D

Heretical Statistician
Leader
Super Moderator
but they don't have to. I suspect that you are confusing development work with validation... this is a sign of immaturity in the development process. think about automotive. they test a vehicle in Alaska and a vehicle in the Mohave desert. the entire system is tested against worst case environmental conditions.

this will sound bold but 'statistical compromises' are physics compromises. No amount of random sampling across nominal conditions can overcome a lack of characterization and knowledge of how the system is supposed to work and cannot predict or confirm adequate performance against requirements. Been in both places.
 

david316

Involved In Discussions
I can understand determining worse case environmental conditions but do you need to test with a system at the worst case of its design. For example does the car have to be at "worst case" with every part made at the extremes of its allowed tolerances or do you test multiple cars (random sample) at worst case environmental conditions?

Consider the case where we are trying to prove the requirement that a car accelerates to at least 100 km/hr in less than 10 seconds over its intended operating environmental conditions.

Thanks
 

Bev D

Heretical Statistician
Leader
Super Moderator
Complex devices are tested at worst case levels at the component level before being integrated at the system level (in best case development). so the components in a car for example have been put through their paces before a random pilot car is tested under worst case conditions. Not all development is perfect the example is intended to point out the utility of testing under worst case conditions - physics can make random sampling unnecessary. we must use both knowledge of the system and the physics as well as probability to inform sample sizes. Random sampling to find random variation is simply highly inefficient and can provide a false sense of security.

In the last example of acceleration, I would use a vehicle that has worst case (allowable by the specification tolerances) components to see if it can accelerate under the worst case intended conditions. on the other hand I've never seen a critical requirement for acceleration except in race cars for which there are only 1 for each driver...
 

david316

Involved In Discussions
"In the last example of acceleration, I would use a vehicle that has worst case (allowable by the specification tolerances) components to see if it can accelerate under the worst case intended condition "

Wouldn't it be impractical to make a "worst case" car. Especially when you consider the electronic components and sensors that presumably effect performance. Personally the idea of testing a worst case design appeals but at the system level I struggle to see how this can be practically done.
 

Bev D

Heretical Statistician
Leader
Super Moderator
Not as difficult as you would think. During development we are creating worst case components. In medical devices we are supposed to this as part of OQ. There are some things that difficult to get at worst case levels (like molded parts from a worn mold) and we have to make decisions based on physics.

your argument for the difficulty of making worst case cars is exactly why random sampling is silly. you can random all you want but until you make a large number of cars that have parts that have naturally varied in their inputs you won't randomly get the full range of variation. this scenario isn't design verify or design validation, it's just long term monitoring of performance. in validation and verify you are supposed to be validating that the full range of allowable variation will still meet requirements - how can you do that if you don't create the full range of variation?
 

david316

Involved In Discussions
" in validation and verify you are supposed to be validating that the full range of allowable variation will still meet requirements - how can you do that if you don't create the full range of variation? "

I believe it is common to use a risk based approach. For example, it might be deemed acceptable to be 99% confident with 99% reliability that a part meets it spec based on the risk.
 
Top Bottom