Questions about Attribute Agreement Analysis

WikipediaBrown

Registered
Good afternoon,

I am attempting to perform an Attribute Agreement Analysis (AAA) for a number of visual inspections that we perform for product acceptance on injection molded parts. This is my first time executing one of these MSAs, and I have run into a couple of issues.

Most examples of AAA that I've seen online have the inspectors evaluating a single defect type. For our products, the inspectors are expected to check the part for several different defects simultaneously. Some of these defects include things like:
-Contamination
-Flash
-Damage (scratches)
-Etc.

How would I structure my AAA to evaluate all of these at the same time? Would my samples need to have a mix of all the relevant defects? Would each defect need to be equally represented in my sample population? And how would this shake out in my mathematical analysis? I'll be using Minitab to analyze the results.

Also, the inspectors are expected to use TAPPI charts as an aid for some of these inspection criteria (e.g. particle size). Is a TAPPI chart considered a "gauge" in this context? And if so, would a Gauge R&R be more appropriate?

Thanks in advance for any input you can provide.
 
Elsmar Forum Sponsor
It depends. Are you accepting purely on the size using the TAPPI chart? If so, I would use a Gage R&R for size. If you are asking your Appraisers to classify the defects, you should include all types of defects in the sample and use an AAA with a Kappa analysis to check how consistently they classify the defects. If you are just looking for how consistently they classify parts as defectives (good/bad), you should include all types of defects in the sample and use an AAA only to check how consistently they classify the defectives as good/bad.
 
In the resources section I have an EXCEL spreadsheet with a multi-defect Kappa calculator. It is called “MSA Tools”.
 
It depends. Are you accepting purely on the size using the TAPPI chart? If so, I would use a Gage R&R for size. If you are asking your Appraisers to classify the defects, you should include all types of defects in the sample and use an AAA with a Kappa analysis to check how consistently they classify the defects. If you are just looking for how consistently they classify parts as defectives (good/bad), you should include all types of defects in the sample and use an AAA only to check how consistently they classify the defectives as good/bad.
I'm a little bit confused. These inspections are being conducted on a sample population to determine final product acceptance; however, the inspector is not recording any measurements. They just decide Pass/Fail depending on their assessment of whether the defect is bigger or smaller than the Tappi chart reference. For that reason, I was thinking AAA was more applicable.

In the resources section I have an EXCEL spreadsheet with a multi-defect Kappa calculator. It is called “MSA Tools”.
Would that be the sheet labeled as "Group Kappa"?

Typically, our organization requires a minimum of 20 samples, 3 appraisers, and 2 replicates for an attribute study. However, our procedure also states, "for multiple defects, more samples are required to cover all attributes". How can I determine the proper number of samples to use? I'm not sure how these minimum requirements are derived. If I'm testing for 6 different attributes, do I need to multiply the number of samples by 6 (so 120 total samples)?
 
Also - I have some concerns about the cognitive load required to inspect for so many different attributes in the span of 4 seconds. We have some non-subjective visual inspection criteria that have been excluded from the analysis because they should be immediately apparent (a component being missing, for example). But I'm a little worried that they may even miss these simple criteria as a result of distraction or fatigue. Should I be considering these types of defects in my analysis, or should I consider those outside my scope?
 
I'm a little bit confused. These inspections are being conducted on a sample population to determine final product acceptance; however, the inspector is not recording any measurements. They just decide Pass/Fail depending on their assessment of whether the defect is bigger or smaller than the Tappi chart reference. For that reason, I was thinking AAA was more applicable.
It depends on how they are using the Tappi chart. Have you told them a single size on the Tappi chart and told them to determine pass/fail compared to the ONE size? That would be an AAA. However, if they are using the Tappi chart to determine the size of the defect THEN deciding whether that size is in or out of spec THEN recording it as pass/fail, it would be a GRR.
 
Attribute studies can be quite complicated particularly for visual inspections (as opposed to go/no-go gauges). Lighting, time allocated, positioning of devices, fatigue, boredom / hypervigilance due to low / high defect rate, etc. all contribute to the ability of an inspector to ‘detect’ defects.

A sample size of 20 is completely underpowered for visual inspections. There is no statistical justification for this, at all. In most of my visual inspections MSAs I have run sample sizes that run in the hundreds with a ‘real life’ defect rate for 100% inspections and the RQL defect rate for inspections that are sample based lot acceptance testing. The MSA testing was done at production speeds under production conditions. The real difficulty with attribute studies is that the study design id the critical element: The distribution of ‘defectiveness’ actually matters more than the %agreement, Kappa score, or McNemer’s statistic. (The distribution refers to very passing, very defective and marginally defective & passing and how many of each are in the sample.)

As a point of statistical clarification. %agreement has no statistical element - it’s just math and cannot adjust for any under percentage of actually defective parts. The Kappa statistic does adjust for this but it can still be gamed a bit - particularly in which statistic you use for a cutoff. McNemeer’s statistic does a better job as it focuses on the disagreement diagonal…but is often overkill for basic production while it is really essential for medical diagnostics…

The other thing to include in your test is a comparison to truth (first inspection results to truth) then repeatability first inspection to second inspection)

I am fairly familiar with high volume part inspection including in the injection molding industry. One method I’ve seen is to take a bunch of parts spread them out on a table and sort quickly through them to see if there are any defects - all defect types are inspected for at one time. I’ve been victimized by this approach many times. It is an inherently poor approach. I’ve also instituted one part at a time visual inspections where the part passes over a light bar and the inspector picks it up and turns it over and then sets it down again in the proper orientation. This is a fairly effective approach for complex plastic parts. Can you describe the production method you are using?

There are also some types of defects that are much more common at start-up and some that are more common during steady state production. These defects can be tested in separate attribute study testing. Otherwise they need to be combined.


Oh and AAA is the American Automobile Association…TLAs are just ‘inside baseball’.
 
I'm bound by certain requirements to use the Kappa statistic as my acceptance criteria. It's also a requirement that 50% of my samples should be "failing".

Let's say I have 3 different types of attribute inspections for this part: 1. Simple Pass/Fail, 2. Aided Pass/Fail (requires the use of a TAPPI chart), and 3. Subjective. All inspections are performed on finished goods that are sampled intermittently during production (some at start of run, some in the middle, some at end of run). The inspectors observe the samples under a light booth (225-300 fc) for 4 seconds, and they are supposed to fail the part if they observe any of the listed defects.

Without revealing too much information, there are 20 different defects called out, and it breaks down like this:

Attribute TypeAQL
Pass/Fail0.065
Pass/Fail0.065
Subjective0.065
Subjective0.025
Aided Pass/Fail0.25
Pass/Fail0.25
Aided Pass/Fail0.4
Aided Pass/Fail0.4
Aided Pass/Fail0.4
Aided Pass/Fail0.4
Aided Pass/Fail1
Aided Pass/Fail1
Aided Pass/Fail1
Aided Pass/Fail1
Subjective1.5
Pass/Fail1.5
Pass/Fail1.5
Subjective1.5
Subjective1.5
Aided Pass/Fail0.65

Currently my plan was to break it up into two different studies, one for "Subjective" criteria and one for the "Aided" criteria. Simple "Pass/Fail" was going to be excluded at this time due to low relative risk, but I sort of have mixed feelings about it, since it no longer represents a "real" scenario with all the cognitive load/fatigue.
 
Well sorry that you are “bound” by (internal only?) a 50% failing part requirement. That is very large for a small sample size of 20 that you mentioned earlier. In fact it is biased in that the ‘appraisers’ will know to look for defects and will in all likelihood remember which defects they called out in the first pass.

That said, any attribute inspection must include all defect types looked for during production. To split up the test by defect types is statistical CHEATING and creates a biased study that doesn’t reflect reality.

But it sounds like your organization is only interested in ‘checking the box’.

One more question: what is the AQL? How is it used? When is it used?
 
Well sorry that you are “bound” by (internal only?) a 50% failing part requirement. That is very large for a small sample size of 20 that you mentioned earlier. In fact it is biased in that the ‘appraisers’ will know to look for defects and will in all likelihood remember which defects they called out in the first pass.

That said, any attribute inspection must include all defect types looked for during production. To split up the test by defect types is statistical CHEATING and creates a biased study that doesn’t reflect reality.

But it sounds like your organization is only interested in ‘checking the box’.

One more question: what is the AQL? How is it used? When is it used?
Hahaha well I appreciate you putting it plainly.

As far as the "50%" internal requirement goes, I think I can probably subvert it as long as I provide a sufficient rationale.

For sampling, I am using ANSI/ASQ Z1.4 Single, Normal, General Level II.
 
Back
Top Bottom