Process Validation Sampling and Data Analysis Techniques (medical devices)

medtech.panda · Jan 13, 2026

Hello!

I am validating a manufacturing line (IQ/OQ/PQ) for a medical device for the first time and am looking for some input on my sampling and data analysis techniques; I am mainly looking to focus on OQ/PQ since IQ seems straightforward.

OQ:
The requirements we developed are not exclusively quantitative, so I am planning to use a combination of techniques for both attribute and continuous data. This is a low risk electronic device, so I am planning to target 95% confidence / 90% reliability, which I know would require n=29 samples for my attribute requirements. I do not have prior data I can use for the quantitative requirements, so I am assuming that I will need n=30 samples to be able to perform a meaningful statistical analysis. However, these devices can be expensive to manufacture, so I am planning to use n=3-5 "golden" devices (we inspect them from our supplier and sign off on them prior to putting them through our portion of the manufacturing process), but then testing each sample multiple times to get 29 or 30 total data points to analyze.

This feels like it might be overkill a bit since the tests we're running should really not have any variability across multiple test runs (we are just flashing firmware and making sure other components turn on / can communicate), but I am pretty sure we still need a statistically significant sample size?

PQ:
From doing some research, it seems that the standard is to build 3 lots that you then sample from; I am planning to take n=29 or n=30 samples from each lot using the same rationale I was discussing in OQ - my PQ requirements will also be a mix of quantitative / qualitative checks. I was planning on doing a similar data analysis (targeting 95% confidence / 90% reliability), but then I started to learn about another parameter - Cpk - that seems to get referenced in various process validation forums.

Is there a preference for using tolerance intervals vs Cpk in the medical device world? Additionally, Cpk seems to require quantitative data, so unsure how it would work for any qualitative requirements/specifications?

Any and all feedback on my strategy outside of my questions is more than welcome, this strategy was developed through online research which included a lot of forums here. Thank you!

Bev D · Jan 13, 2026

medtech.panda said:
Hello!

I am validating a manufacturing line (IQ/OQ/PQ) for a medical device for the first time and am looking for some input on my sampling and data analysis techniques; I am mainly looking to focus on OQ/PQ since IQ seems straightforward.

So first let me say that trying to do something you know nothing about is a bit…well you know. It is good to come here and ask questions tho as you know what some of your limitations are. Seriously you don’t even know what you don’t know and this is far more complex than a simple google or Chat GPT search. So welcome to a gorup of peopel who have spent decades studyign, practising and learnign how to do this. We have discussed this a million times so a search here would be quite helpful for you. First you can start by reading my paper on “Sampling for Validation Testing”.

medtech.panda said:
OQ:
The requirements we developed are not exclusively quantitative, so I am planning to use a combination of techniques for both attribute and continuous data. This is a low risk electronic device, so I am planning to target 95% confidence / 90% reliability, which I know would require n=29 samples for my attribute requirements. I do not have prior data I can use for the quantitative requirements, so I am assuming that I will need n=30 samples to be able to perform a meaningful statistical analysis. However, these devices can be expensive to manufacture, so I am planning to use n=3-5 "golden" devices (we inspect them from our supplier and sign off on them prior to putting them through our portion of the manufacturing process), but then testing each sample multiple times to get 29 or 30 total data points to analyze.

What do you mean by ‘low risk’? The only thing of value here is if the severity of failures is trivial to very minor. A 90% reliability means you will ‘pass’ the validation even there are up to 10% defects in the lot. Is 9% defective rate OK?

There are some who will accept this sample plan but they are not knowledgable about validation or statistical sampling. This will only tell you that the material in front of you does not exceed some defect level, IF your sampling frame was correct. It is not predictive of future results.

Measuring golden parts is not the right sampling frame. At all. It will validate nothing.

The bold part about measuring a few parts several times is just an ignorant math trick. It isn’t statistics. The problem lies in the fact that the ‘repeat’ readings are not independent, so while you might be able to make some ‘statistical calculations’ they will not be true.

medtech.panda said:
This feels like it might be overkill a bit since the tests we're running should really not have any variability across multiple test runs (we are just flashing firmware and making sure other components turn on / can communicate), but I am pretty sure we still need a statistically significant sample size?

What exactly is the output characteristic(s) of interest?

medtech.panda said:
PQ:
From doing some research, it seems that the standard is to build 3 lots that you then sample from; I am planning to take n=29 or n=30 samples from each lot using the same rationale I was discussing in OQ - my PQ requirements will also be a mix of quantitative / qualitative checks. I was planning on doing a similar data analysis (targeting 95% confidence / 90% reliability), but then I started to learn about another parameter - Cpk - that seems to get referenced in various process validation forums.

Is there a preference for using tolerance intervals vs Cpk in the medical device world? Additionally, Cpk seems to require quantitative data, so unsure how it would work for any qualitative requirements/specifications?

Well Cpk is stupid (sorry, I’m old and cranky and I’m tired of people clinging to the Cpk thing like they believe that the earth is flat and we didn't go to the moon…) . Search here for discussions regarding Cpk.

The caveat is that if your organization or reviewers or auditors are at a typical level of shallow knowledge of statistics this plan will work - it will check the box. But it won’t be helpful or insightful to you.

medtech.panda said:
Any and all feedback on my strategy outside of my questions is more than welcome, this strategy was developed through online research which included a lot of forums here. Thank you!

You’ve come to the right place and are asking the right questions. Good for you. Please do some more research. Here is a good start search topics like Cpk and validation sampling and the resource section).

I’m sure others will pipe in and some will say that you can do what you propose. And of course you can - everyone desrves to check the box and get lucky. But do some real research, put on your critical thinking cap and look to differentiate myths from real statistics.

medtech.panda · Jan 13, 2026

Bev D said:
So first let me say that trying to do something you know nothing about is a bit…well you know. It is good to come here and ask questions tho as you know what some of your limitations are. Seriously you don’t even know what you don’t know and this is far more complex than a simple google or Chat GPT search. So welcome to a gorup of peopel who have spent decades studyign, practising and learnign how to do this. We have discussed this a million times so a search here would be quite helpful for you. First you can start by reading my paper on

Hi Bev, appreciate the quick response! I read your paper and the key takeaway I got is that the sampling plan should include testing at the upper and low spec limits to ensure the product we are producing is consistent across the full operating range of our equipment. I am assuming this primarily applies to OQ since PQ (from what I understand) needs to be run at normal operating conditions.

Bev D said:
What do you mean by ‘low risk’? The only thing of value here is if the severity of failures is trivial to very minor. A 90% reliability means you will ‘pass’ the validation even there are up to 10% defects in the lot. Is 9% defective rate OK?

We performed a risk analysis via pFMEA that resulted in any residual risk after mitigation being low. 10% defects are fine for where this company is at for now. I understand this defect rate is quite high, but we do not have the resources to test n=59 samples (for 95% reliability) or anything beyond that... Don't shoot the messenger

Bev D said:
There are some who will accept this sample plan but they are not knowledgable about validation or statistical sampling. This will only tell you that the material in front of you does not exceed some defect level, IF your sampling frame was correct. It is not predictive of future results.

Measuring golden parts is not the right sampling frame. At all. It will validate nothing.

The bold part about measuring a few parts several times is just an ignorant math trick. It isn’t statistics. The problem lies in the fact that the ‘repeat’ readings are not independent, so while you might be able to make some ‘statistical calculations’ they will not be true.

What exactly is the output characteristic(s) of interest?

Understood! I am going to give a bit more detail on our equipment setup since I think it is a little unique compared to examples I am seeing talked about in these forums. Our equipment does not have any configurable settings, so there are no upper/lower specs that we set ourselves. We are mounting a PCBA to our equipment and flashing firmware / serializing it, that is it. The equipment then performs a series of QC checks on key device sensors (make sure device bluetooth/nfc communication is working, the device temperature sensor is working etc.), which can be quantitative or qualitative, but we are not changing/doing anything else to the device. So from reading your paper, it seems that I can just test that any quantitative QC check passes at the upper and lower limit, even if it is just one sample at each (assuming the physics is well understood, which I believe it should be here).

My initial plan for the golden samples (along with known bad samples to validate the failing result check) was as a way to validate the QC checks functioned as intended, but I understand this is getting more into general test method validation, which I suppose should be performed separately instead of nested in this process validation?

Bev D said:
Well Cpk is stupid (sorry, I’m old and cranky and I’m tired of people clinging to the Cpk thing like they believe that the earth is flat and we didn't go to the moon…) . Search here for discussions regarding Cpk.

The caveat is that if your organization or reviewers or auditors are at a typical level of shallow knowledge of statistics this plan will work - it will check the box. But it won’t be helpful or insightful to you.

You’ve come to the right place and are asking the right questions. Good for you. Please do some more research. Here is a good start search topics like Cpk and validation sampling and the resource section).

I’m sure others will pipe in and some will say that you can do what you propose. And of course you can - everyone desrves to check the box and get lucky. But do some real research, put on your critical thinking cap and look to differentiate myths from real statistics.

No need to apologize, I was suspicious of Cpk and thank you for confirming those suspicions. I will plan to use confidence/reliability levels to sample from 3 lots.

Thank you for the thorough response!

Tidge · Jan 14, 2026

This doesn't sound like a classic mechanical assembly process for which "process validation" (in the IQ/OQ/PQ sense) is applicable... at least without some "torture".

medtech.panda said:
This feels like it might be overkill a bit since the tests we're running should really not have any variability across multiple test runs (we are just flashing firmware and making sure other components turn on / can communicate), but I am pretty sure we still need a statistically significant sample size?

If every device is being checked for a "successful flash"... this would be 100% verification, so the need for a classic process validation is moot.

If you want to convince yourself that the "process of flashing" is successful with 95% confidence and 95% tolerance, 5 of 5 successes would be sufficient (based on a null hypothesis test of H0 = 50% that maybe it works, maybe it don't, and an alternate hypothesis of H1 = 99%).

Process validation is done to account for variability of the process: For medical devices the OQ is challenging the allowed variability inherent to the process (allowed process ranges) while the PQ is challenging the variability due to the inputs of the process (materials, people, time of day). The sample sizes are constructed based on the types of understanding you are trying to develop.

Bev D · Jan 14, 2026

Well as I’ve said before standards only require compliance ot minimal quality standards. IQ, OQ, PQ or other types of validation are certainly a good thing to do regardless of whether or not the processes are ‘special’ or not. I can say that from years of experience where good validations were NOT done and the products experienced a plethora of quality problems at launch and as changes were made in the supply chain and in the manufacturing processes - the cost of which far exceeded the time and cost ‘savings’ of not doing the validations.

But the bigger point here is that ‘flashing’ a device is pretty close to a ‘special’ process as all of the functions of the software are rarely ever fully tested, including the functions that are affected by intermittent conditions. I’ve been burned by this many times. (Pun intended) and the fact that this is a low quality organization (as described by the OP) makes the probability of a poor flash even more likely.

Validation is always a good thing to do unless the only failures that can occur are truly trivial or very minor in severity. Unless of course you are a fly-by-night organization that just wants to gut the golden egg laying goose

Tidge · Jan 14, 2026

In my experience: a 'special process' is one with an output that cannot be verified... typical examples are sterilization, welding. Memory flashing can be verified, at one place we did a sort of checksum activity to verify that flashing occurred.

medtech.panda · Jan 14, 2026

Performing 100% verification for the flashing seems like a practical and viable method of avoiding process validation; however, our equipment performs a series of QC checks as well, which I am assuming would require some level of validation. This is why I was planning to use "golden" and "known bad" samples to validate that we would get a pass or a fail from each test when appropriate. I am trying to avoid this whenever possible since I believe it would be better to use a slightly more objective method to validate functionality of the different equipment components that perform these checks. This is easier for some tests (ex. temperature sensing) than others.

medtech.panda · Jan 14, 2026

medtech.panda said:
Performing 100% verification for the flashing seems like a practical and viable method of avoiding process validation; however, our equipment performs a series of QC checks as well, which I am assuming would require some level of validation. This is why I was planning to use "golden" and "known bad" samples to validate that we would get a pass or a fail from each test when appropriate. I am trying to avoid this whenever possible since I believe it would be better to use a slightly more objective method to validate functionality of the different equipment components that perform these checks. This is easier for some tests (ex. temperature sensing) than others.

Also want to quickly clarify that I am only talking about OQ here, PQ will involve sampling from 3 lots built under normal conditions (no golden samples or anything like that)

Tidge · Jan 14, 2026

medtech.panda said:
Also want to quickly clarify that I am only talking about OQ here, PQ will involve sampling from 3 lots built under normal conditions (no golden samples or anything like that)

What are the process variables that you will challenge during the OQ?

Bev D · Jan 14, 2026

Tidge said:
In my experience: a 'special process' is one with an output that cannot be verified... typical examples are sterilization, welding. Memory flashing can be verified, at one place we did a sort of checksum activity to verify that flashing occurred.

But I am not saying this is a special process. I am saying that unless you test all functions under all conditions you will miss things. Sure a Check sum will tell you a flash - of some quality level occurred. It will not tell you it was correct.

Please note that the OP isn’t asking about whether this is “required” or not. To their credit they are asking about how to do it correctly under poor management conditions…

If you don’t care if you miss stuff that’s your problem. I remember one incident when a firmware flash ‘passed’ ‘ the minimal ‘flash was successful test’ and wrong answers were given to a diagnostic test. My recommendation is always to forget the damn ‘minimal quality’ standards. Stop contract lawyering. Use common sense and good engineering practice. Both of which are all too rare these days.

Process Validation Sampling and Data Analysis Techniques (medical devices)

medtech.panda

Registered

Bev D

Heretical Statistician

medtech.panda

Registered

Tidge

Bev D

Heretical Statistician

Tidge

medtech.panda

Registered

medtech.panda

Registered

Tidge

Bev D

Heretical Statistician

Similar threads