Determining Sample Size in Design V&V activities

Ronen E

Problem Solver
Moderator
Curious: I presume that none of your clients produce low-volume devices? We produce our devices in low quantities, and so to procure 30 samples exclusively for testing purposes would be totally impractical.

For us, sample sizes have to be justified by risk, and despite the FDA, this isn't always tied to statistics.
Look up Bootstrap.
 
Q

QA-Man

:lol: ...Sorry, but my cynical side has to chuckle at this... .

I think you're right to be cynical. I certainly am. After all, how many people come to the boards just to find an answer they can apply without necessarily understanding everything that's behind that answer? Do you really need to understand gravity and equilibrium when they learn to ride a bicycle?

:Curious: I presume that none of your clients produce low-volume devices? We produce our devices in low quantities, and so to procure 30 samples exclusively for testing purposes would be totally impractical.

It is just a general strategy...sometimes it is one sample that is tested thirty times.

For us, sample sizes have to be justified by risk, and despite the FDA, this isn't always tied to statistics.

At the risk of sounding cynical, doesn't risk involve the probability of a hazard's occurrence? :cool:
 

anpun

Registered
Hello,

Could you please help me?

I want to perform verification and validation activity for the Image processing machine. However, I am not sure, how to select the sample size for design verification and validation. This product is low volume product, the total production of this product is 20 machines/year and this is costly product. Hence, we have decided to go for the number of repeats. But how much it should be?
I have seen lots of verification and validation threads, in which I understand generally people select the number of repeats 3,5 or 7. But how i should provide justification for this?
 

Mark Meer

Trusted Information Resource
I think an underlying point of confusion is the FDA's requirement for "Sampling plans, when used, shall be written and based on a valid statistical rationale." (21 CFR 820.250(b)).

Is there any guidance as to what is accepted as "valid statistical rationale"? As I've suggested in this thread, we want selection of samples to be tied to risk...but - QA-Man's point regarding probability notwithstanding - this is pretty much a judgement call.

For example, we have a requirement the product weighs less than X. This is strictly so that we can estimate shipping, and has no bearing on safety or intended use. Therefore, in verification, we weigh the output of one unit of production, and that's it. I'd have a hard time arguing that this was a "valid statistical rationale" for choosing a sample size of 1, but for our needs it seems appropriate and sufficient.

Thoughts?
 

Bev D

Heretical Statistician
Leader
Super Moderator
The major considerations for sample size are
1. The variation of the process = the standard deviation
2. The allowable difference from the target/goal or amount of ‘inaccuracy’ you can tolerate in the estimate. = delta. This is equivalent too early leg of the risk vector.
3. The ‘confidence’ in the estimate (1-alpha risk) this is the other leg of the risk vector. This is what most people interpret to be ‘probability’.

These are all from the statistical formula for determining sample size for continuous and categorical data.

This is what constitutes the statistical rationale.
So for Mark’s example, the ‘difference’ is huge, and the statistical confidence* can be low. The variation in weight is probably fairly small compared to the ‘tolerance’ for packaging so 1 is sufficient.
For anpun’s example, it can be justified because of the volume adn that this is validation that 1 unit is sufficient. The number of repeats should be based on the imprecision (standard deviation) of the instrument. This could be based on the design requirement or from development data.

*statistical confidence is NOT what most payment think of when they think confidence.
Like much of quality science these are complex topics that require deep understanding. A passing ‘remembery’ of formulas is not sufficient.
 

Mark Meer

Trusted Information Resource
As always, thanks Bev! You are amazingly helpful for those statistically-challenged such as myself! :)

For the benefit of other Covers, and to be sure I've got this right, I'd like to maybe go step-by-step through a concrete, hypothetical, example.

Suppose specification is 20 +/- 2

Step 1: Estimate Variance (standard deviation, SD)
We could just "guesstimate" the variance. We are confident the process produces consistent outputs so the variance should be relatively small (SD<1).
Or, alternatively, we make a rough estimate by taking STDEV from, say, the first 10 units (aside: would this need to be justified?)
STDEV(20,20,21,20,19,18,20,21,20,19) = 0.92
So, for this example, SD = 0.92

Step 2: What is the allowable tolerance (delta, d)
This is taken from the specification (20 +/- 2). So the tolerance is: d = 2

Step 3: What is the sought confidence interval (t = 1 - alpha risk)?
We will choose 95%. Looking up on the t-table, this is a value of: t = 1.96

Step 4: Calculate the sample size
We use the formula for continuous data: n = [t*SD / d]^2
= [(1.96)*(0.92) / 2]^2 = 0.81
Rounding to the nearest integer, gives us n = 1

Hence, according to the calculations above, only 1 sample is required in this example.

Do I have this correct? :cfingers
 

Bev D

Heretical Statistician
Leader
Super Moderator
that would be a correct calculation.

the critical element here is the choice of delta. the larger delta is, the smaller the sample size.
the thing that is difficult to wrap your head around is the whole idea of the confidence interval and where the results fall. in the case of a single sample the value must fall beyond the 'known' process average or target by more than the delta amount. with a single sample you can't calculate a confidence interval for the sample...

the concern here is going to be much more than the 'statistical' math however (nothing is ever that simple). first using the first 10 units from the process to estimate the SD is OK IF the process is homogenous. processes rarely are homogenous. Next, look at the SD you calculated, 0.92. at +/- 3 SD you certainly have process values beyond the 20+/-2 tolerance. not many but some. so there is a probability that you're single result lies beyond 18 or 22 simply by chance. (that chance is around 5% given you're alpha rate of 5%). the temptation will be to deny that the single result is indicative of a true difference, larger sample sizes help with this as does time series data showing the actual process variation.
 

Bev D

Heretical Statistician
Leader
Super Moderator
Anpun - you can use one machine and calculate your sample size as the number of 'runs'. you can use the continuous data formula Mark posted above. try that and see what you get...
 

Statistical Steven

Statistician
Leader
Super Moderator
I think an underlying point of confusion is the FDA's requirement for "Sampling plans, when used, shall be written and based on a valid statistical rationale." (21 CFR 820.250(b)).

Is there any guidance as to what is accepted as "valid statistical rationale"? As I've suggested in this thread, we want selection of samples to be tied to risk...but - QA-Man's point regarding probability notwithstanding - this is pretty much a judgement call.

For example, we have a requirement the product weighs less than X. This is strictly so that we can estimate shipping, and has no bearing on safety or intended use. Therefore, in verification, we weigh the output of one unit of production, and that's it. I'd have a hard time arguing that this was a "valid statistical rationale" for choosing a sample size of 1, but for our needs it seems appropriate and sufficient.

Thoughts?
The key is "when used". If you have a non-statistical plan, there is no statistical rationale needed. Your assessment based on risk and engineering knowledge is that a sample size of 1 is appropriate. The problem is when people sample n=3, but have NO rationale, either statistically or otherwise.
 

Mike S.

Happy to be Alive
Trusted Information Resource
Steven,

I hate it when standards use terms like "when used" or "where appropriate" ambiguously, as they do (IMO) in this 21 CFR 820.250. Maybe they mean what you suggest (non-statistical sampling is okay if you say it is). Maybe they mean if you don't 100% test, then you are sampling, and then you need supporting rationale.

It is much more clear if they use language like this: "Seller shall perform 100% inspection, or, if sampling inspection is used, the sampling shall conform to requirements of blah blah blah..."
 
Top Bottom