Equipment Qualification sample size - attribute data

Shahar_Waserman

Registered
Hello everyaone, and thanks for any help suggested.
I have created (inhouse) tester/jig for cable continuity, this is replacing the continuity test with a fluke DVM.
the tester shows the user whether the cable was properly manufactured (basically a continuity tester), the tester SW does not decide for the user if its pass or fail, the user looks at the results and decide whether the cable is good.
there are no measurements, just a visualizstion of the cable pin connections.
so the data can be considered an attribute type data.

now I am in the process of qualifying the tester, and wanted to consult with the group, how many cables do I need to test (lets assume I know whether they are pass or failed before the test) in order to qualify the tester
would 5 of each typee will be sufficient? , if so, what would be the statistical rational. would 10 suffice?
because I don't have 50 or 100 cables to test (not a mass production type of process)
since I used the fluke DVM until now (and no , it was not qualified) would a comparison of results between the fluke and the new tester will have an added value?

I couldn't find any help within the ANSI/ASQ 1.4.

Shahar
 

Semoi

Involved In Discussions
It is impossible to come up with a sample size recommendation without any inputs. From my perspective the inputs should be:
1. Define acceptance criterium: Somewhere your company needs to defines the acceptable detection probability, and confidence pair for different criticalities. E.g. if the product risk (=criticality) is "high", we need to demonstrate that the test method detects a NOK part with a probability of at least 95% with confidence of at least 95% -- these values must be defined somewhere, as they need to balance the product risk and the business risk.
2. Set criticality: Use your FMEA to sets/defines a criticality for each failure mode.
3. A specification: It must be clear, which cable is OK and which is NOK
4. Representative sample: Finally, you will need to obtain a sample, which is representative for the population. While this is impossible in practice, because you will never obtain samples representing all possible failure modes, you definitely should select samples with care.


The detection probability of 95% is often enough, because the number of nonconforming parts, which are shipped to the customers, scales with the probability of producing nonconforming parts. E.g. if we produce non-conforming parts with a probability of 15%, the calculation goes as follows:
1. Out of 1000 parts 150 are nonconforming.
2. Only 5% of the 150 nonconforming parts are shipped to the customers.
3. Thus, only 7.5 parts are nonconforming and 1000-150=850 are conforming. Hence, the nonconforming rate is 7.5/850=0.88%.
Note that this reasoning does not account for conforming parts, which are tested to be nonconforming. Hence, the nonconforming ratio increases a little bit. However, the take away of this calculation is: Work on your production process to ensure quality, not on your test method.
 

Shahar_Waserman

Registered
It is impossible to come up with a sample size recommendation without any inputs. From my perspective the inputs should be:
1. Define acceptance criterium: Somewhere your company needs to defines the acceptable detection probability, and confidence pair for different criticalities. E.g. if the product risk (=criticality) is "high", we need to demonstrate that the test method detects a NOK part with a probability of at least 95% with confidence of at least 95% -- these values must be defined somewhere, as they need to balance the product risk and the business risk.
2. Set criticality: Use your FMEA to sets/defines a criticality for each failure mode.
3. A specification: It must be clear, which cable is OK and which is NOK
4. Representative sample: Finally, you will need to obtain a sample, which is representative for the population. While this is impossible in practice, because you will never obtain samples representing all possible failure modes, you definitely should select samples with care.


The detection probability of 95% is often enough, because the number of nonconforming parts, which are shipped to the customers, scales with the probability of producing nonconforming parts. E.g. if we produce non-conforming parts with a probability of 15%, the calculation goes as follows:
1. Out of 1000 parts 150 are nonconforming.
2. Only 5% of the 150 nonconforming parts are shipped to the customers.
3. Thus, only 7.5 parts are nonconforming and 1000-150=850 are conforming. Hence, the nonconforming rate is 7.5/850=0.88%.
Note that this reasoning does not account for conforming parts, which are tested to be nonconforming. Hence, the nonconforming ratio increases a little bit. However, the take away of this calculation is: Work on your production process to ensure quality, not on your test method.
Hi and thank you for the quick reply, the sample size is not intended to evaluate production parts!.
its intended to qualify a tester, its an evaluation whether the tester can provide reliable results, kinda like MSA (measurement system analysis), but in regards to GRR, there is no employee impact on the test results, sO I want to test XX nu,ber of cables which I know are good and XX number of cables which are know are faulty, and see that the tester provide a reliable test result.
the question is how many parts should I test.

let's assume the risk is low, since these cables are retested at subsequent processes during final test.
 

Bev D

Heretical Statistician
Leader
Super Moderator
First remember that GRR is a proper name and not a rigid requirement. If you have no operator effect and only one piece of equipment you can still perform a GRR - you’ll simply be testing repeatability and not reproducibility and that’s OK. Although in many cases you should prove that the operator has no effect especially in equipment qualification. My experience is that even when it you might not think it, physics can surprise you and the operators do have an effect. In qualification it is better to be sure…

A Statistical (mathematical) determination of sample size is not always necessary ( and in reality, there is no single correct sample size, although there are a ton of incorrect sample sizes, it really depends on what you trying to do). If you really have a small number of cables and low production values then it can be valid to use a sample size of 10 might be useful and informative IF you select them carefully. While it is important to have a couple of definitely good and definitely bad the real information comes from those close to the cutoff between good and bad and especially known intermittent cables (e.g. loose connection).
 

Semoi

Involved In Discussions
When you say you are interested in qualifying “a tester” you mean a measuring device — which might just yield the result “pass/fail”. Is this correct? If yes, this is what I described. To get some rational you need to provide an acceptance criterium. This might be in the form “take a sample size of 22, and check that all samples are correctly measured/tested” or you take a (reliability, confidence) pair such as (90%, 90%) and calculate the required sample size. I describe the ladder format in my first response, but you should provide the format
 

Bev D

Heretical Statistician
Leader
Super Moderator
So those statistics are appraise for acceptance sampling of a lot (reliability, confidence). Some use them for process validation (a topic of other threads here). For an MSA (which is what qualification of a measurement device requires) the common applicable statistical analyses are the Kappa test (which I find to be too ‘liberal’). The other statistical test is McNemer’s test. Remember that the results of a repeatability test will give you a cross tabs of pass/pass, fail/fail, pass/fail and fail/pass. The region of interest is the pass/fail and fail/pass. You can read about this in my paper on measurement system validation (Free - Verification and Validation of Measurement Systems - Elsmar Cove Quality and Business Standards Discussions in the resources section. There is also an EXCEL spreadsheet that has most of the common methods and listed references.

While it is nice to have a ‘representative’ sample (defect rate of eh current process) the first best approach is to challenge the system’s ability to get the marginal units first, then if you have the population from which to sample you move to a representative sample. This isn’t alwasy possible for a new process but can be done for an existing one with a ‘new tester’…

I see no reason to not use 5 bad and 5 good as long as the majority of parts are marginally good and/or bad (3 and 3). The stats will idle a wide a confidence interval because of the ‘small’ ample size - but you will learn a LOT about the tester’s repeatability. Remember every statistical test in the world is only as good as the makeup of the sample that is tested…
 

Shahar_Waserman

Registered
So those statistics are appraise for acceptance sampling of a lot (reliability, confidence). Some use them for process validation (a topic of other threads here). For an MSA (which is what qualification of a measurement device requires) the common applicable statistical analyses are the Kappa test (which I find to be too ‘liberal’). The other statistical test is McNemer’s test. Remember that the results of a repeatability test will give you a cross tabs of pass/pass, fail/fail, pass/fail and fail/pass. The region of interest is the pass/fail and fail/pass. You can read about this in my paper on measurement system validation. There is also an EXCEL spreadsheet that has most of the common methods and listed references.

While it is nice to have a ‘representative’ sample (defect rate of eh current process) the first best approach is to challenge the system’s ability to get the marginal units first, then if you have the population from which to sample you move to a representative sample. This isn’t alwasy possible for a new process but can be done for an existing one with a ‘new tester’…

I see no reason to not use 5 bad and 5 good as long as the majority of parts are marginally good and/or bad (3 and 3). The stats will idle a wide a confidence interval because of the ‘small’ ample size - but you will learn a LOT about the tester’s repeatability. Remember every statistical test in the world is only as good as the makeup of the sample that is tested…
Thank you, Iv'e downloaded your article and will review it.
 

Semoi

Involved In Discussions
Here is a link to Wayne Taylors homepage. In the comments he states that the (reliability, confidence) method is applicable not only for "process validation", but also for "test method validation" (TMV). He also recommends to focus onto the false acceptance rate, which is not done if we use either the kappa method or McNemar's method. In addition, although McNemar's method is for matched pairs, it assumes that the two values of each pair are equivalent. This is not the case, because the first is the "true value" while the second is the "measurement" -- at least I hope you possess a reference value for each tested part.

While the tolerance interval is widly used in accaptance sampling, it is a well accepted method in validation as well. Thus, from my perspective it makes sense to check all sort of statistics/analysis and run many test during the developing phase. However, once you are done with the optimization, and you are satisfied with the result, you enter into the validation phase. During the validation phase I recommend you are using a simple and standardized method. The tolerance interval is such a method.
 
Top Bottom