# Sample Size Justification for Medical Device Shelf-Life

N

#### NicoleinFlorida

Hello everyone,

I have a question about how to determine the sample size for medical device shelf-life test.Our device is Class II medical device, and expected shelf-life is 3 years.

Shelf-life test will be conducted with five stages:
1. Time 0 No aging
2. 6-Month accelerated aging (for 510k)
3. 36-Month accelerated aging
4. 6-Month real-time aging
5. 36-Month real-time aging

During each stage, package tests (integrity, seal strength, and 100% visual inspection) and device performance tests will be conducted.

In our protocol, we used the AQL Level I that required 32 samples for each test. I am considering whether we can reduce this number with a rational statistical justification.

According to the thread about V&V sample size, I am considering to use R=0.95, C=0.95 which gives a N=59. So does this "N" means the number of devices for each tests? Or if we have three tests for the package, then can N be divided by 3, which means that we can use 20 samples?

I will really appreciate your responses.

#### Bev D

##### Heretical Statistician
Super Moderator
it really depends on what you are trying to do. Although the AQL sampling plans are NOT appropriate for these types of tests.

If you only want to demonstrate that some percentage of your product will survive beyond some specified period of time, the confidence and reliability plan will suffice. the sample size applies to EACH of the tests you specified. time zero through 36 month real time...you may not need to perform each of those but regardless the sample size applies to each.

If you are trying to model the stability to use continuous data to predict the failure rate over time the sample size will be based on the part to part variation at time zero and you will need to use continuous sample size calculations for this approach. and again the sample size will apply to each time period tested.

Some people will select smaller sample sizes based on convenience but they are statistically justifiable. depending on your reviewer and the nature of your product this may be acceptable or it may not.

N

#### NicoleinFlorida

Hello Bev D
Thank you for the response.

The objective of this test is to establish the shelf-life of our devices. Therefore, your first recommendation of confidence and reliability plan fits the best.

The materials used on our devices are mainly PC and stainless steels, whose shelf-life will be much longer than the expected 3 years. Therefore, the sterile barrier system (package) will be the most critical for the establishment of the shelf-life. In this way, could we use smaller sample sizes for the devices?

For the package, because it is the most critical part for shelf-life testing, we would like to adopt the confidence and reliability plan.

Thank you again.

#### Bev D

##### Heretical Statistician
Super Moderator
It doesn't matter what you are testing for shelf life. The sample size remains the same. You may be able to test only the packaging without using the PC inside of it - but this isn't a statistical question it's a physics question and a question for your reviewer...does the shape and size of what's going to be inside the packaging effect the shelf life of the packaging?

#### Statistical Steven

##### Statistician
Super Moderator
Nicole

I have a few questions for you to get a better understanding of your issues.

1. Does 6 months accelerated aging equate to 36 months real time based on your Arrenhius model?
2. Why are you doing 36 months accelerated aging if you are doing 36 month real time testing?
3. What are you only testing 6 and 36 months real time? For stability studies it is beneficial to have more time points than more samples per time point, especially if you have a failure at 36 months, as your label claim would only be 6 months based on your data.
4. If you are doing go/no testing at each time point, sample sizes will be large since as Bev mentioned you can only make confidence and reliability statements (could be the rationale for accelerated conditions). Are there any continuous type tests that can be done?

Hello everyone,

I have a question about how to determine the sample size for medical device shelf-life test.Our device is Class II medical device, and expected shelf-life is 3 years.

Shelf-life test will be conducted with five stages:
1. Time 0 No aging
2. 6-Month accelerated aging (for 510k)
3. 36-Month accelerated aging
4. 6-Month real-time aging
5. 36-Month real-time aging

During each stage, package tests (integrity, seal strength, and 100% visual inspection) and device performance tests will be conducted.

In our protocol, we used the AQL Level I that required 32 samples for each test. I am considering whether we can reduce this number with a rational statistical justification.

According to the thread about V&V sample size, I am considering to use R=0.95, C=0.95 which gives a N=59. So does this "N" means the number of devices for each tests? Or if we have three tests for the package, then can N be divided by 3, which means that we can use 20 samples?

I will really appreciate your responses.

N

#### NicoleinFlorida

Hello Steven,
1. No.6 month accelerated aging is for 510k application.
2. We would like to label our product with 3 year shelf-life with the accelerated aging data. And the 3 year real-time aging is to validate the accelerated aging.
3. Do you mean that we can have more time points with smaller sample size? For instance, in our original plan, we will test 32 samples at 6 month and another 32 samples at 36 month. So your suggestion is to test maybe 10 samples at 6 month, 12 month, 18 month, 24 month, 30 month and 36 month? According to Bev's comment, I have to ensure the sample size is large enough for each test at each stage. So if I would like to do more tests throughout the 36 months, I have to increase the sample size dramatically?
4. Our device is a single use device, so I am not sure whether we can have continuous test data for the shelf-life. The objective of the tests after the accelerated aging is to test whether the device is functioning as intended. Basically, it will be a go/no-go test. Could you provide some examples of the continuous test data?

In general, the device is not cheap and we would like to reduce the sample size for the shelf-life testing but with rational justification. Could you give me more advises?

#### Statistical Steven

##### Statistician
Super Moderator
Hello Steven,
1. No.6 month accelerated aging is for 510k application.
Assuming a Q=2 that is typical for medical devices. No issues there.
2. We would like to label our product with 3 year shelf-life with the accelerated aging data. And the 3 year real-time aging is to validate the accelerated aging.
Again pretty standard for a medical device (assume it's injection molded).
3. Do you mean that we can have more time points with smaller sample size? For instance, in our original plan, we will test 32 samples at 6 month and another 32 samples at 36 month. So your suggestion is to test maybe 10 samples at 6 month, 12 month, 18 month, 24 month, 30 month and 36 month? According to Bev's comment, I have to ensure the sample size is large enough for each test at each stage. So if I would like to do more tests throughout the 36 months, I have to increase the sample size dramatically?
My point is that at 6 and 36 you can test 32 samples for your 95/90 at each time point. I would do some sampling between 6 and 36 months to get some ensure that the product still functions. Maybe n=10 for a 95/75 assurance at 12 and 24 months.
4. Our device is a single use device, so I am not sure whether we can have continuous test data for the shelf-life. The objective of the tests after the accelerated aging is to test whether the device is functioning as intended. Basically, it will be a go/no-go test. Could you provide some examples of the continuous test data?
If the test is just a go/no-go test then stability is not really what you are testing per se. Continuous testing would be measuring the force values, or tensile strength or other measurement. I am always weary of functioning as intended testing, as it is a very subjective criteria that does not allow for an assessment of degradation over time. I will give you an example form a past life. We made a connector for tubing that was accelerated life tested over six months. When connected it needed to function as intended, meaning no leaks. We connected them to tubing, and testing showing no leaks. So the test passed. We never tested if the membrane was more or less brittle or other physical characteristics of the product. Just my
In general, the device is not cheap and we would like to reduce the sample size for the shelf-life testing but with rational justification. Could you give me more advises?

#### Bev D

##### Heretical Statistician
Super Moderator
a few other clarifications:

'continuous test data' means measuring a continuous variable as opposed to an attribute or categorical variable. continuous variables being measureable such as force, tensile strength, permeability, etc. and categorical data being count data such as the number of passing or failing parts.

categorical data will almost always result in larger sample sizes than continuous data.

if you are highly confident that the 36 month test will pass, you only need to test at that time period. but as Steven pointed out, you may wan tot play it safe and test some parts at interim time points just case they fail earlier than you might expect. in this case sample size doesn't have to be 'statistically' justified as you are doing it to protect yourself from an early failure that you otherwise might not detect until 36 months. of course smaller sample sizes at interim time points increases the risk of missing a slight degradation before 36 months. however this is your business risk to determine...

sample size at interim time points only matters if you intend to 'model' the degradation. While you can do this for categorical data the sample sizes would be very large to get a meaningful model and it doesn't sound like this would be helpful to at this time.

#### Statistical Steven

##### Statistician
Super Moderator
Bev

My suggest sample size of 10 at interim time points is to find a catastrophic issue. Truth is that a sample size at t=0 and t=36 months with a c=0 sampling plan really has no statistical power. If it intended to find large issues with "stability". I tend not to use such small sample sizes for performance based testing, but for package integrity testing where either all the samples will usually pass or a large percent will fail (greater than 10%). A sample size of 32 is about 95/90 and even if you pool the samples to get a sample size of 64, that still approximately 95/95, so 5% defective rate can still pass 95% of the time .

Just my

a few other clarifications:

'continuous test data' means measuring a continuous variable as opposed to an attribute or categorical variable. continuous variables being measureable such as force, tensile strength, permeability, etc. and categorical data being count data such as the number of passing or failing parts.

categorical data will almost always result in larger sample sizes than continuous data.

if you are highly confident that the 36 month test will pass, you only need to test at that time period. but as Steven pointed out, you may wan tot play it safe and test some parts at interim time points just case they fail earlier than you might expect. in this case sample size doesn't have to be 'statistically' justified as you are doing it to protect yourself from an early failure that you otherwise might not detect until 36 months. of course smaller sample sizes at interim time points increases the risk of missing a slight degradation before 36 months. however this is your business risk to determine...

sample size at interim time points only matters if you intend to 'model' the degradation. While you can do this for categorical data the sample sizes would be very large to get a meaningful model and it doesn't sound like this would be helpful to at this time.

N

#### NicoleinFlorida

Hello Ben and Steven,

Thank you for all the clarification.

The materials of our devices will not degrade a lot throughout shelf-life, especially the inner mechanism which is made of stainless steel. The integrity of the package actually will mainly determine the shelf-life. However, we need to have finished devices inside the package to perform the package integrity test, shipping simulation test and device performance test in order to establish the shelf-life.

I agree with you that adding more interim time points can provide more protection for us, and may identify failures early.

For the stability testing, we assess the performance of our devices, results in pass/fail categorical data. In our case, unfortunately, we cannot use continuous data to reduce the sample size