# Sample Size for Distribution Simulation Testing

#### PkgTest

##### Registered
Hi,

I am new to this board and have been reading many of the posts on sample size. What great information. I have a question about determining sample size for distribution simulation testing.

I work for a consumer products company that produces a wide range of products. We ship door to door via the single parcel distribution channel (UPS, FedEx, etc.). All our products are shipped together in one box based on what the customer orders. We conduct distribution simulation testing in our lab using vibration and drop equipment. We have a set mix of mock products that we put into the box along with the sample that we are testing. The mock product mix is based on average customer orders taken over several months. We also use a 0-3 scale (no damage, minor, moderate, and major) that we classify each test sample. Zero to 1 is a pass, 2 is borderline and 3 is a fail. The samples we receive for testing are collected during manufacturing line trials. They typically run about 2000 samples during these line trials and samples are submitted to various groups for testing, our group being one. Right now, we request 59 samples based on attribute data sampling (n=59 for 95/95) and we say that if we see zero failures after testing, we have an upper limit of 5% damage. For more high-end products, we have even used a sample size of 120 to say we have an upper limit of 2.5% damage. Cost of the product is not a factor.

Does this approach make sense? Can I use the 0-3 scale as variable data to lower the sample size? If I can use the scale as variable data, what formula would I use to calculate sample size? We do many tests, so reducing sample size while still keeping a good confidence in the results would be appreciated.

Thanks in advance for any information.

#### Miner

##### Forum Moderator
Staff member
@Bev D please chime in here. Bev has discussed worst case testing schemes that minimize the need to test such large sample sizes.

#### Bev D

##### Heretical Statistician
Staff member
Super Moderator
Give me a few hours and I’ll publish a brief description of how - and why - we do this with really small samples.
I’ll also describe why you absolutely cannot treat the 1-2-3 ordinal scale as variables data. Although @Miner will likely be me to that

#### PkgTest

##### Registered
Thanks Bev D. and Miner. Really appreciate and looking forward to your description.

#### Miner

##### Forum Moderator
Staff member
Without going into a lot of detail, it is really a matter of how much information is contained in the data. Pass/Fail (binomial) data has the least information content as there are only two states possible with the counts for each state. Ordinal date such as your 0-3 scale provides a little more information because it adds more states (4 for your scale) plus some directionality (severity in your case). However, ordinal data cannot provide the amount nor the quality of information provided by continuous data, which has an infinite number of states. Ordinal data cannot provide information on how much more severe a 1 is to 0, or a 3 to a 2. Continuous data can provide that. 2 kg has exactly twice the mass as 1 kg. 10 kg has exactly twice the mass as 5 kg. Ordinal data cannot provide this information. In addition, scales are often nonlinear. Take the ordinal classifications of hot peppers vs. the Scoville SHU measurement.

#### Attachments

• 258.1 KB Views: 4

#### Bev D

##### Heretical Statistician
Staff member
Super Moderator
Sorry I don’t yet have the more detailed write up -with pictures. I can probably post it on Tuesday. Until then the idea is to test at the extremes. By extremes I mean at the specification limits which are supposed to be set to guarantee no failures. The limits are supposed to be determined through experimentation. Of course this doesn’t always happen, but certainly stuff in teh middle of the specficaitons will likely work. The idea of the confidence/reliability or any other sampling plan for validation (as opposed to lot to lot acceptance testing) is to randomly select samples from across the specification range so that you get some good and some ‘bad’. IF the sample is truly random from a population that is in fact representative of the distribution of parts that will be produced in the future across the specification range THEN the testing will give you a “pass/fail” idea that the future population will have no more than the established defect rate. The defect rate may be less than that. This type of sampling is not intended to estimate the likely defect rate. Given that it is a single sample there is some probability that the real defect rate is higher than your stated reliability. The larger failure with these types of sample plans is that you actually don’t have a truly representative sample because you didn’t sample from a representative population - that would require you to build parts at the limits of the specification and in validation many people rarely do that - they produce at the target. (Or the starting position in the case of mold wear)

For situations where the failure may be dependent on an interaction (tolerance stack-ups or conditions for failure) you have a distinct advantage to leverage in addition to using only parts at the extremes. Many failures are a result of the stress-strength interaction. If the stress is strong enough and the part is weak enough you get failure. Under the same high stress a strong enough part won’t fail. Testing of packaging is exactly this situation without a lot of imagination. So if you compile the ‘weakest’ packaging allowed by your specifications, the worst case packing scenario AND the worst case stresses (but not foolish, the stress has to actually exist in real life and be one that you guarante survival of). Then you only need to test ONE sample. If it passes everything stronger will pass dunker weaker stresses. If it fails you know you have to improve something on your end.
(It’s like the ‘mouse in the house’ problem: do you really need to establish exactly how many mice are in your house before you begin eradication procedures? Don’t you KNOW that if you saw one, there are many more? And they didn’t ring the front door bell to get in...)

So when you test at the extremes only you dont’ Need a statistical sampling plan. You are not trying to overcome the unknown distribution of parts and the unknown distribution of conditions in the field by random sampling - you have deliberately selected the worst case parts and are testing at the worst case conditions. If you pass there you pass at the best case case conditions...

#### PkgTest

##### Registered
Thanks Bev D. Regarding testing at the extremes, by your definition, “the specification limits which are supposed to be set to guarantee no failures.”, I think that may be where our testing is not right. The distribution test that we run in our lab is based on vibration and drop field data we recorded over 100’s of actual truck routes in our biggest market. We put these recorders in our boxes and sent them out over several months. We then took that data and set our test at the 95th percentile of the vibration and drops we recorded. So basically, we are testing near the worst extremes we saw in the field. This pretty much guarantees we see failures during our test instead of guaranteeing no failures. On the occasion where we see no failures, we feel really confident that our product will survive the shipment to our customer. On the other end of the spectrum is we typically see a good % of failures during our test. So the question we face then becomes, so now what? Will we actually see this many failures in the field? Well, I guess, if we are shipping to our worst routes. But those routes only comprise a small % of our shipments (according to our field recorder data, only the top 5%). Should we set our test at the 50th percentile of the data we collected since you said the middle of the specification will likely work?

I have read many of your posts on sample size and have learned so much. I really look forward to your response.

Thanks.

#### outdoorsNW

##### Quite Involved in Discussions
If I understand correctly, the data was collected from a sample of shipping on typical routes with typical products. If you test at the 95% percentile level, you will likely see failures of around 5%. These may or may not be acceptable. If your goal is high customer satisfaction, they probably are not. If you test at the 50% percentile, then about 50% of your shipments will be damaged in shipment.

If your test focused on the most damage prone products, then the failure rates will be better because most products are not as likely as the test products to be damaged. If you knew what shipping routes and methods create the greatest risk of shipping damage, then the real world numbers will be better. But is is not clear to me that your testing to date did anything to focus on worst case type conditions.

#### PkgTest

##### Registered
We sent our multiple field recorders multiple times on more than 100 different routes in our biggest market. But let's say we sent it on exactly 100 routes, we selected the data from the route that had the 95th percentile worst vibration and drops. We use that 95th percentile data to run our test. So, only 5 routes experienced worst vibration and drops then what we test. Currently, we use the attribute sample size of 59 (c=0, 95/95). If all samples pass, we are confident that our field damage will be low (but another question is are we overpackaging?). However, if samples fail our 95th percentile test, what percentage of failures would we have seen at the 90th, 75th or 50th percentile to put the damage in perspective? Is testing at the 95th percentile vibration and drops the proper way to go about this and can a sample size less than 59 be used? Thanks.

#### Bev D

##### Heretical Statistician
Staff member
Super Moderator
I have attached a brief overview of the probabilistic approach to V&V testing along with a critique of the weaknesses of using acceptance sampling plans for V&V testing.

I would also like to comment on two statements that PkgTest has made. apologies if I didn't interpret their comments correctly.
"if samples fail our 95th percentile test, what percentage of failures would we have seen at the 90th, 75th or 50th percentile to put the damage in perspective?" This is a valid question of course if you are trying to confirm a failure rate greater than ~5% at your worst case packing. This type of margin assessment should be completed prior to validation testing. It is actually a development activity. One should never go into Validation without knowing what you will discover. remember that validation is supposed to be a confirmation that you got your design right to meet requirements. (although I realize that this is too often not the case). If you fail at worst case stress then you should correct your design to meet requirements. the goal is not to 'pass' validation, it is to have very few damaged packages.
"Is testing at the 95th percentile vibration and drops the proper way to go about this and can a sample size less than 59 be used?" I typically use 1-3 units (based on cost and the organizations nervousness about only using one part for V&V testing) at the worst case condition. If I have no failures at worst case conditions with worst case parts I will have no failures in the field.

Now of course in my organization, we test many characteristics across many vintages of products so the total sample size used reflects that.

Also we must acknowledge that some failures will still occur. the use conditions can change drastically - a truck gets squashed by a giant boulder in a mud slide (happened!) or other thing that was equally catastrophic and in the realm of force majeure. for these situations we have recovery and compensation plans in place. But these are truly rare and should never be a reason to NOT improve the quality of your product. There are also some characteristics that we are unaware of in early product manufacture and they can rise up to get us. in these cases we institute corrective actions and IMPROVE OUR KNOWLEDGE of the system.

#### Attachments

• 28.8 KB Views: 77
Clinical assessment sample size - Medical device Class IIb implantable (93/42 directive) EU Medical Device Regulations 2
Stress / Challenge Conditions for Design Verification Testing to Reduce Sample Size 21 CFR Part 820 - US FDA Quality System Regulations (QSR) 11
How can Acceptance Number and Reject Number be larger than Sample Size on Z1.4 Table? Inspection, Prints (Drawings), Testing, Sampling and Related Topics 8
Validation Sample Size for Tray Seal Qualification and Validation (including 21 CFR Part 11) 3
Design Verification Sample Size vs Repeats Statistical Analysis Tools, Techniques and SPC 9
Minimum sample size - Guidance and statistical rationale Inspection, Prints (Drawings), Testing, Sampling and Related Topics 3
Unrealistic Packaging Validation Sample Size 21 CFR Part 820 - US FDA Quality System Regulations (QSR) 41
Control chart for huge sample size Statistical Analysis Tools, Techniques and SPC 9
Is there a standard for sample size during R&D phase Other ISO and International Standards and European Regulations 18
Sample size definition in an Automotive SMT pilot lot run Misc. Quality Assurance and Business Systems Related Topics 1
Correct way to certify hydrostatic testing when it is not 100% (and Sample Size) Various Other Specifications, Standards, and related Requirements 6
Determining sample size for device sterility Inspection, Prints (Drawings), Testing, Sampling and Related Topics 3
Determining of sample size for 'Operational Qualification' AQL - Acceptable Quality Level 5
Bayes Success run Theorem for sample size during OQ&PQ Qualification and Validation (including 21 CFR Part 11) 4
Device modifications - Clinical sample size rationale EU Medical Device Regulations 5
Sample size for creating a data base as a reference to a tested variable Other Medical Device and Orthopedic Related Topics 6
T Sample Size for Design Validation Design and Development of Products and Processes 4
C AS9138 Sample Size Determination AS9100, IAQG, NADCAP and Aerospace related Standards and Requirements 1
Acceptable maximum RSD (relative standard deviation) for an sample size Gage R&R (GR&R) and MSA (Measurement Systems Analysis) 1
E Sample size for design verification of variable in single use device Design and Development of Products and Processes 20
R Sample size for clinical validation/investigation EU Medical Device Regulations 4
Process Validation sample size selection Statistical Analysis Tools, Techniques and SPC 0
J Interesting Discussion Sample Size Determination for Medical Device Other Medical Device and Orthopedic Related Topics 19
P Is it possible to make an educated decision using a very very small sample size? Inspection, Prints (Drawings), Testing, Sampling and Related Topics 3
Sample Size for Biocompatibility Tests Other Medical Device Related Standards 4
V Firmware Verification Testing & Sample Size Software Quality Assurance 1
Determining Sample Size for Medical Device Component Validation Inspection, Prints (Drawings), Testing, Sampling and Related Topics 0
S Surveillance Sampling Test - Determining Sample Size Inspection, Prints (Drawings), Testing, Sampling and Related Topics 5
Sample Size Calculation (Confidence Interval and Reliability) - Medical Devices Reliability Analysis - Predictions, Testing and Standards 9
P Cleaning and Disinfection Validation Sample Size of 3? Other Medical Device Related Standards 2
Sample Size Calculation for Image Analysis - Microscopy Inspection, Prints (Drawings), Testing, Sampling and Related Topics 2
G Sample Size, Significant Figures, Scale General Measurement Device and Calibration Topics 3
L Heated Sealed Packages - Sample Size for OQ (Operational Qualification) and PQ Inspection, Prints (Drawings), Testing, Sampling and Related Topics 11
P Frequency and Sample Size Requirements for an MSA Studies Gage R&R (GR&R) and MSA (Measurement Systems Analysis) 4
When and How to apply the Bayes Success Run approach for Sample Size Determination Inspection, Prints (Drawings), Testing, Sampling and Related Topics 4
N Sample Size Justification for Medical Device Shelf-Life Inspection, Prints (Drawings), Testing, Sampling and Related Topics 13
Process Qualification Sample size Document Control Systems, Procedures, Forms and Templates 6
Sample size for IEC 60601-1-11 environmental tests IEC 60601 - Medical Electrical Equipment Safety Standards Series 1
IEC 60601-1-2 Sample Size issue Other Medical Device Related Standards 1
A How to calculate Mean and standard deviation without sample size in Minitab Using Minitab Software 5
D What is the minimum Sample Size for Weibull Analysis Reliability Analysis - Predictions, Testing and Standards 12
C Sample Size for Stability Testing Other Medical Device and Orthopedic Related Topics 2
M Calculating Cpk when sample size equals to 1 Capability, Accuracy and Stability - Processes, Machines, etc. 12
F Power and sample size for factorial design Using Minitab Software 4
S Sample Subgroup Size for NP/P chart - nPbar>=5 Statistical Analysis Tools, Techniques and SPC 4
U Sample size to prove parts are good with 99.73% confidence Inspection, Prints (Drawings), Testing, Sampling and Related Topics 4
F Sample Size to get Statistically Valid Data Measurement Uncertainty (MU) 6
S Low Sample Size for Gage R&R Gage R&R (GR&R) and MSA (Measurement Systems Analysis) 1
SPC Card for One Sample Size - Customer requires for all critical dimensions Statistical Analysis Tools, Techniques and SPC 8
S Determining sample size for inspection to achieve x% confidence re defects Misc. Quality Assurance and Business Systems Related Topics 10