Performing Gage R&R on Automated Measuring Systems, a consensus?

Jay Kay

Registered
Hi,

So, I know there are a bunch of threads on this topic, however in many of them there are some disagreements or differing methods or opinions, so perhaps this new one could get a consensus or further discussions.

Situation and thoughts....
Situation: Let's say your task with doing GRR or finding a way to characterize and understand the acceptability of the automated measurement system used in discrete manufacturing via understanding how "good" the measurement systems are. (i.e. D-MFG). By D-MFG I mean the final end of line testing ultimately results in PASS/FAIL which is done based on continuous data sets let's say 3 different specifications are measured in UOM1, UOM2, UOM3 UOM= Unit of Measure. No human interaction

Would you use do this using a series of Type 1 gage studies on the three different continuous data sets from the automated end of line testing? 1 part measured 50 times against those three specs, using a Cg/Cgk ratio of 10 (more conservative basis) with STDV multiplier of 6 for criticality. and then you would do this UOM1, UOM2, UOM3 and based on the Cg, Cgk you can characterize, Bias and how could the automation test station is?

Now lets say there are three test stations? If Type 1 is the way to go would you run it three times on each station so a total of 3 sets of UOM data's? so it would be 3^2 or 9 type 1 gage studies using the same parts?

OR

would you do an attribute GRR with 20 parts, 10 good and 10 bad, two replicates on each automation station? Including in that results would be alpha and beta error %'s and screen vs effectiveness percentages? This method would basically be looking at the discrete data output of the continuous data measurements if all test pass then it says Pass if anyone of the three tests fail then it says fail and the attribute study marks it at fail.


OR

would you do a GRR 10 samples 3 trial 3 stations but assume the Appraisers are automated test stations, reproducibility is based on the automation equipment and not human operators? So basically, run a normal gage rr but the operators are just the automation equipment. not sure the validity of this since the other element take into account EV...


I think the idea is qualify using statistics that the equipment is valid beyond just calibration and IQOQPQ and then track yields at each step.


I have seen suppliers simply just use type 1 gage studies or a series of type 1 gage studies and then call it a day and say the equipment is statistically sound....
 

Bev D

Heretical Statistician
Leader
Super Moderator
There are multiple ways that can work for this type of situation that are all valid depending on the type of characteristics and the risk of passing a failing part and rejecting a passing part. It also depends somewhat on the logistics of performing repeated tests on the same units.

The other thing to consider is the intent of the person running the test who might answer this question: are they just l poking to check th box and move on? Then they will suggest the least insightful, easiest to run test that is most likely to pass. If they really care about understanding the system they will likely suggest the more involved test method. You will never achieve ‘consensus’ with this type of environment.

In your example I never recommend a ‘type 1’ kind of approach as automated test/inspection systems depend on how the part is presented to the active measurement system and how close the part is to nonconforming. (Unless of course all you care about is checking the box).

i also go beyond the stupid AIAG approach of 10X3X3 for continuous data and use 30 parts measured twice. If it is automated equipment and the operator has no real interaction in loading the parts then I only use one operator. If the operator has substantial interaction then I use 3 operators.

For categorical data - especially for attributes measurements like with visual inspection, I use 50 parts as a reasonable estimate of the correct pass/fail calls is a categorical data standard eviction which requires a larger sample size to be reasonably accurate. I also tend to use more ‘marginal’ parts as this is what actually challenges the system.
 

Ron Rompen

Trusted Information Resource
I am working on much the same type of problem (GRR of leak testers) and I have come up with a solution which fits MY criteria, and which my customer is also satisfied with.
There are 2 leak testers, identical in design and function.
I have 2 known test samples, 1 PASS and 1 FAIL. The 'real' leak rate of these two parts is unknown - I only know that one has passed, and one has failed.
I will be performing a Type 1 gauge study.

Each sample will be measured 10 times (remove and replace) and the average of these 10 measurements will be used to determine the REFERENCE value of the part.
After this step has been completed, each part will be measured 50 times, on each leak tester. (A total of 100 measurements for each part).
Data will be analyzed via Minitab for Cg, Cgk %Variability and Bias.
 

Semoi

Involved In Discussions
As Bev stated above, the best method is to analyse preliminary test datasets to (1) optimise the systems and (2) determine its weaknesses. Next, you should design the qualification in such a way that you ensure that the weaknesses appear with an acceptable rate. Hence, do not expect to find a method which is perfect for all situations. Nevertheless, I'd like to take the statistical perspective on your three options:

Option 1: For each part and each measurement device you run a Type 1 Gauge R&R study
From my perspective this method ensures that each measurement device is able to measure each part with an acceptable uncertainty. My main concern with this approach is that it is questionable that the reference parts are representative of the population.

Option 2: Use an attribute GRR
Only use attributive results, if numerical results are not available. The sample sizes explode, if you wish to get into the acceptance regimes, which are usual for numerical results.

Option 3: Use a Type 2 GRR
From a statistical perspective there is noting wrong with this approach. If you look up each term in the model
Y_meas(i, j, k) = Y_part(i, j, k) + Y_op(i, j, k) + eps(i, j, k)
i = 1, ..., number of part
j = 1, ..., number of operators (or in your case number of meat. devices)
k = 1, ..., number of repeated measurements
you find that the analysis estimates the (random) operator component sigma_op, which is part of the operator term Y_op ~ N(0, sigma_op) in the model equation. The model is NOT WRONG, if the operators (measurement devices) component is "small" -- the term simply becomes negligible. You could also add an interaction term, and the model is still not wrong -- only if the interaction components sigma_part:eek:p is negative, the model is wrong and you should either drop the interaction term or use a different evaluation algorithm.

Conclusion:
From a statistical perspective there are two options. However, I would only use option 3, if somebody else demands that we use a "large" sample size for the qualification: Instead of using this large sample size several times, we are able to combine the results of all measurement devices and thus run perform fewer runs on each device.
 
Top Bottom