Sample size for design verification of variable in single use device

E

Engineering Guy

Hi All, long time reader, first time poster. I am a mech. engineer, frequently involved in "bench testing"of medical devices.

I have trauled through the forums many times over the last few years on sample size selection and never really get much wiser. I've asked as many colleagues as I can in my 7 years in the medical industry, but have never been given any good reference for developing a 'valid statistical rationale' for design verification. I have seen many good references for process capability and batch acceptance but never for design verification. To date I have been either using a standard which dictates sample size (e.g. sterilization validation and BioC) or somewhat arbitrarily picking a number that I think will represent the nominal design. Some of my testing has been submitted in 510(k)s or in EU submissions for class II devices. My sample size rationale has never been questioned, but more and more I hear people talking about valid statistical rationale.

So here is my current case:

I have a disposable medical inhaler device - for sake of arguement lets say its a gas mask which filters incoming air with activated carbon that you only use for 20 minutes before discarding - which i need to test for airway particulate. The device is already manufactured and sold. Batch sizes are between 25k to 100k. ISO 18562-2 defines a limit for airway particulate emitted by the device. That limit is 12 um/m^3 (12 micrograms of particulate, per cubic meter of air inhaled through the device). The standard says the recommended test methods are "Type Tests" which to me means 1 sample. However, given this is a disposable device, and the particulate is likely to vary - it actually does contain activated carbon granules which are assembled into the device manually - i am concerned that 1 sample would not represent the nominal design. What rationale can i use in selecting a sample size for this application?

I have done some preliminary tests which indicate the average is 4 um/m3 across 10 devices. Without getting into details - that test took 10 hours, and does not allow me to measure the particle density for individual devices, only the average across all.

Note that the above standard has only recently become recognized, hence why i am testing after the product is released (nothing dodgey here!)

TIA!
 

Mike S.

Happy to be Alive
Trusted Information Resource
What exactly is the definition of “valid statistical rationale”? What does that mean? And where is it specified?

Of course, 1 sample is barely representative of anything, especially when you make huge lot quantities (25k +) of those things. Even a 9 piece sample, with a C=0 sampling plan, is only a 10% AAQL. But, depending on the answer to the above, maybe all you need to do is specify some statistics that describe your sampling plan.

Can you measure 1 at a time or only 10 at a time? How long does it take for 1 at a time?
 

Bev D

Heretical Statistician
Leader
Super Moderator
Can you confirm that you are doing design verify and not lot acceptance?

In the case of a device that filters particulates in design verify, the sample size will be less statistical than physically justified. for example, filters are typically designed to filter a range of particulates with some minimal limit. (particles under a certain size will not be filtered out). if your design was intended to filter certain types of particulates at a particular maximum density in the air then you would subject a number of devices to that type of particulate at the maximum expected density and then measure the critical criteria at least at the end point. The number to test might be simply be a single device at each of the min and max tolerances of the critical factors that control the filtering. If there are many critical characteristics you might just test at the level of the tolerance that is expected to be the best and the worst at filtering. A more robust approach is to characterize the deterioration in filtering over time for those levels. this will characterize the 'margin' you have in your design to natural variations in the manufacturing process. I would ask my teams to test a 'handful' of devices. in this case there really is no reason for a statistical sampling size. as you are in design verify.
 
E

Engineering Guy

Mike and Bev, thanks for your response.

Mike, I can't post a live link yet due to restrictions but google "21CFR820.250" and see first result regarding "valid statitical rationale." I can only measure 10 at a time with my current test method. Without getting into the detail, measuring 0-12 micrograms is very difficult so i put more than 1 m^3 of air through the devices (as recommened by the standard). Because the devices are used for only 20 minutes at 80 L/min, I have to put 10 devices through the test before I can expect any measurable amount of particulate on the filter.

Bev -I am indeed doing design verify, not lot acceptance. The test result will likely be submitted in a 510(k) and filed in bench testing in the Tech. File. I only mentioned lot sizes to give you a feel for intended volumes. There may need to be some lot acceptance testing introduced in future, depending on my results.
Naturally, in abstracting my device I have mislead you in the purpose of my testing. But i will continue with the breathing mask example - what I need to test is not the ability of the device to capture incoming particulate, but to test if the device itself is creating unintended particulate. I.e. activated carbon dust is making its way into the airstream. So we supply very clean air to the gas mask intake and measure any particulate that would pass into the patients mouth/lungs.

What you said about testing filtering ability makes sense and it sounds like im on the right track. Would you have any suggestions, considering my (attemped) clarification?

Cheers
 

Mike S.

Happy to be Alive
Trusted Information Resource
Hmmmmm....820.250 uses the term "valid statistical rationale" but does not define it. Leave it to the government -- they might as well have said "must be a good sampling plan" :rolleyes:.

Aerospace sampling document ARP9013 uses the term "statistically valid" but goes on to define what that means.

As I see it, you could create a whitepaper that explains your company's process and the rationale you have for using it and hope it is adequate, or you could ask your customer or regulatory body for advice or further clarification.
 

Quality_Strong

Registered
Everywhere I have worked, sample size has been determined through your risk management process. The risk management process usually determines a severity and occurrence of specific failure modes and assigns a risk category. The risk category is usually associated with a given statistical sample size.

The sample size itself is usually based on a confidence and reliability level, typically I've seen levels of 85%, 90%, 95%, 99%. Combined, these can give you an actual number that needs to be tested based on a normal distribution characteristics.
 

david316

Involved In Discussions
Everywhere I have worked, sample size has been determined through your risk management process. The risk management process usually determines a severity and occurrence of specific failure modes and assigns a risk category. The risk category is usually associated with a given statistical sample size.

The sample size itself is usually based on a confidence and reliability level, typically I've seen levels of 85%, 90%, 95%, 99%. Combined, these can give you an actual number that needs to be tested based on a normal distribution characteristics.

Is the risk category used for sample size prior to any risk mitigators put in place or post?

Thanks
 

Quality_Strong

Registered
Is the risk category used for sample size prior to any risk mitigators put in place or post?

Thanks
If this is a new product, I would say prior to any risk mitigation. This way, you will be evaluating a more worst-case condition. You may reduce the value of your assigned occurrence of a particular failure mode with the justification that you implemented the risk mitigator. This will likely help you to reduce sample size for any future tests that you perform post-launch.
 

david316

Involved In Discussions
Hmmm, I may need to elaborate a little. Typically when doing risk management, an unacceptable risk (based on severity and probability of occurrence) may be identified. A risk mitigator is used to lower the risk to an acceptable level. The risk mitigator needs to be verified to have objective evidence it functions as intended. To define a sample size to test said mitigator it makes sense to me to use the pre-mitigation level as this effectively tells you how important the risk that you are reducing is. Using the post-mitigation risk level doesn't make sense to me. Thoughts?
 

Bev D

Heretical Statistician
Leader
Super Moderator
Since mitigation almost never changes severity, only the ‘probability’ of occurence can be reduced. A true fact: the lower the occurence rate, the higher sample size must be to detect a defect. Your measurement system must also be fairly good to detect a defect. Occurence and detection are not independent.

Your sample size will vary depending on what stage you are in, what kind of testing/inspection you are doing and the severity, actual defect rate and your MSA results. The more severe the effect, the lower the defect rate and the worse your MSA the higher the sample size should be. In verification and validation the sample size can be reduced by testing at worst case input and use conditions. In ongoing inspection, the sample size can be reduced if you have continuous data and you apply statistical process control. If you have categorical data the sample size will increase up to 100% error proofing.

Determining sample size is not a cookbook thing. It requires thought, knowledge and understanding. If you can give us specific details we can help guide you to a sample size. (It takes me a week to teach my students the fundamentals of sample size.).
 
Top Bottom