Determining Sample Size in Design V&V activities

Hi All,
First I'd like to thank you all for the insightful comments which help clear things up, but still, I'm not sure I'm there yet.
I have a couple of questions which some of you perhaps could help with:
1. For software Verification and Validation testing, would you believe the rational of SW being a deterministic process, and thus having a 0 standard deviation would justify why pure SW testing is done on a sample size of 1 device?
2. When considering Design Verification and Validation, for an expensive piece of equipment, for example, an MRI scanner or a CT Scanner, it would probably not make sense to test 20-30 systems? how would you tackle this?
If a test protocol tests many different Variables, how would we go about explaining the sample size for it? for each variable a different sample size would be needed, and also, it should be defined while planning, before you have actual results.

I hope any of you could help clear things up.


Bev D

Heretical Statistician
Staff member
Super Moderator
We must remember that sample size is not controlled only by the number of units when we have an instrument or device that will be used more than once. This is the case with software and medical devices such as CT or MRI scanners. (Examples of single use things are pills, vaccines, blood collection tubes etc.)

For multiple use things, sample size includes the number of things and the number of uses - or runs - of each thing.

Statistically speaking I would use at least 3 of the things regardless of the cost. There may be exceptions and only your reviewer/statistician can answer that question.

As for software, it is deterministic but only to a point. Remember that software doesn't exist in isolation of the device, the user or the conditions of use. Conditions and varying inputs will effect the results. So your sample size for software will be based on both number of devices (the physics of the different device builds dictate that they will not be the same) and the number of runs or uses...
Thank you for the reply, that did clear up the issue on repetitions (runs) accounting towards the sample size.
In your experience, would using AQL tables from ANSI Z1.4 although not optimal for V&V Practice (As I read in the article Statistical Sampling Plan for Design V&V of Medical Devices by Liem Ferryanto, Ph.D.)
suffice as a sample size selection rationale for an FDA audit as a statistical rationale?
As I did notice that for a lot/batch size of 2-8 units, 6.5% non-conforming would accept the hypothesis with 0 failures for a sample size of 2 units.

Thanks in advance!

Julie O

my company has to define a procedure for defining sample sizes regarding design verification.
It sounds like you are trying figure out what methods to use to determine sample sizes for design verification. Like Ronen, I'd want to see exactly what the inspector wrote, but I think the procedure the inspector was looking for was a high-level procedure that describes the process you follow in order to figure out what statistical methods to use to determine sample sizes for design verification.

I wouldn't expect to see any statistical methods cited in this procedure. The procedure should be applicable to any and all design you verify, and there is no way of knowing what types of verification you might need for future designs, much less which statistical methods might be appropriate to use to determine the sample sizes.


Involved In Discussions
For software design V&V you test the design to see if it works, full stop. How much testing you do, i.e. to what depth is up for debate (see the FDA guideline General Principles of Software Validation).

You might choose to do usability testing and for that you will have to justify the size of the user/test group, and again the is and FDA guideline on usability to help you there.

But you only need to exerise the software once.
I've always been confused by this concept of statistically determining sample sizes for design verification activities.

As was mentioned in this thread, it really depends on the product and characteristics, no?

I'd like to pose a related question: when using standards (e.g. IEC 60601) to verify design requirements, how do these justify sampling?

For example, we had a design requirement "shall be unaffected by a drop from 1m onto hard surface", which was verified through the IEC 60601 drop test. The 60601 testing, however, used (if I recall correctly) just 2 units, without justifying this number. The FDA didn't seem to have any problem with this...

Julie O

it really depends on the product and characteristics, no?
I agree. That's why I think the inspector might not be looking for a procedure that identifies specific approaches to determining sample sizes for specific types of verification testing, but a high-level procedure that describes the process the company will follow to decide all that for whatever type of verification it might do.
Hi Mark,

60601-1 explicitly states that the test are conducted on one unit.

5.2 TYPE TESTS are performed on a representative sample of the item being tested.
Note, also, that the tests are expected to be conducted in a specific order (see Annex B) and still pass all requirements after any damage that may result from tests. So the markings legibility test is the last test after all the rubbing, cleaning simulations, heat, humidity, etc etc.
60601-1 explicitly states that the test are conducted on one unit.
Thanks Pads. Only one representative sample then.

So while the FDA is harping on the OP to justify his/her sampling for their design verification activities, industry-standards (that the FDA recognizes) use a sample of 1 for verifying requirements?

Bev D

Heretical Statistician
Staff member
Super Moderator
an element of confusion here: guysta was asking about software validation and Mark you invoked testing for specific characteristics covered by industry test standards. they are markedly different beasts. Where an accepted standard exists, the FDA - or other regulatory body that accepts that standard - will accept the sample size (or the auditor/reviewer should accept it).
In the case of medical devices in general there is no set sample size for function and/or general hazard. the sample size should be statistically justified in light of the risk. In the case of software there is considerable 'discussion' regarding the aspect of the deterministic nature of software and the conditions and inputs it uses form the device it is running on...

Top Bottom