Questions about Attribute Agreement Analysis

Quick update:

Unfortunately I wasn't able to steer my colleagues away from doing the 2-part study that I mentioned previously (basic attributes in one study, attributes requiring a TAPPI chart in the other study). I was able to convince them to increase the total number of samples for each study from 20 (the minimum per our SOP) to 30, which still isn't ideal, but it is what it is. At the end of the day, we are still trying to "check the box" so to speak.

One thing I'm still struggling with is that I have no idea what the statistical basis is for determining the number of samples. I would like to be able to determine from first principles what the appropriate sample size would be for my study, but I don't even know where the standard rule comes from (50 samples, 3 appraisers, 2 replicates). I know it has something to do with the total number of appraisals.

Finally, I have another issue where one of the attributes that I would like to evaluate is for string flash, which we occasionally see during production. This defect is naturally very fragile, and I highly doubt that the failing sample could hold together for more than 2 or 3 appraisals (if that). Are there any recommendations for attributes like this, where the visual inspection itself has the potential to be destructive?
 
I reviewed the AIAG MSA reference manual and did not find a reference for how they selected their sample size. I then tried Copilot and it first gave me the standard AIAG line then immediately hallucinated when I pressed it further. Next I went with ChatGPT and had better luck. This is what it said:
Questions about Attribute Agreement Analysis

Questions about Attribute Agreement Analysis
 
There is NO statistical justification (no matter how much mathematical gymnastics anyone does to ‘prove’ it now) for 50 parts 3 people and 3 inspections. The only stats that matters for both attribute and continuous data is the estimation of the repeat standard deviation. This is the number of repeats of each part X the number of parts. (It is NOT the total number of observations…and the increase from 2 to 3 repeats does not substantially improve the estimation of the SD. Increasing the number of parts does). So in both attributes and continuous data the studies are underpowered. And for attributes the real problem (as I have said) is the number of marginal parts as well as teh number of good and bad parts….this is not and cannot be statistically determined. For Attributes data the best way is to use the sample size determined by the Binomial or Poisson distribution (not the AQL tables which are only marginally statistically determined themselves. There is no lot size consideration in attributes data estimation.) and the actual RQL you have. Then you salt the sample with teh RQL defect rate. And don’t forget the sample is inspected under real world conditions (lighting, speed, mix of defects…). At some point it’s no the sample size that matters but he ability to detect marginal defects at production conditions and rate… not everything that ‘sounds’ statistical is statistical without massive sample sizes. Put your inspection data in a control chart adn track per operator…THAT will provide you with the ultimate test fo an appraisers ongoing ability to detect defects.

As for the fragility of the string, you can use a unique sample set for each appraiser to minimize the effect of teh handling removing the string…and you could even only use the expert panel inspection vs a single inspection of each appraiser…in case the string actually comes off. But liek every other study design you have to TRY IT to know what happens - your fear of what might happen is only a fear.
 
And you will get wildly different results depending on how you select your samples (i.e., clearly good/bad vs. marginal and how many of each).
 
I guess it's not as cut and dry as I had hoped. Thank you both for the valuable info!
“Statistics” in the industrial environment rarely is.you can check my ‘presentations’ in the Resources section…they include case studies, explanations and most importantly a vast reference list for personal study…
 
What if I have an attribute where the defect is unable to be replicated in production without significant cost? Are there alternative approaches I can try? Our production team is having a difficult time producing samples with flash.
 
Back
Top Bottom