Need help with a Kappa Study (MSA) - Reviewing x-ray Images

Jim Wynne

Leader
Admin
Thanks Jim.

I think we're in agreement here. I've spent a good while counseling folks on PGA - practical, graphical, then analytical as a way to approach problem solving and project work. Not looking for the 4 decimal solution certainly but I do think a combination of a graphical approach with some quantification where it makes sense is a good way to go.

In this case I was really looking with specificity, to see if anyone here had experience with this extension of Fleiss' Kappa. Not intending an academic random walk just looking for a specific tool.

Cheers.

This is frankly over my head as to details (it doesn't take much :D) but we do have some people here who can give you some good advice.
 
N

NumberCruncher

Hi Adam

All I ended up doing was calculating agreement by counts.

3 staff view an item (or image in your case) a total of 3 times each.
They can record a maximum of 9 faults in any named category. So the headline figure for agreement is (number of agree)/(number of view).

(9 faults of type A) / (9 views) = 100%
(8 faults of type A) / (9 views) = 88.8% (one person found fault A on 2 out of 3 viewings).

Of course, the problem with this is that you can never have less than 50% agreement in any category.

So I defined other statistics based upon the same basic idea.

Within staff agreement = (number of items with total agreement) / (number of items viewed)

Staff #1 viewed 100 items. 95 items were judged as having either no faults, OR the same fault on all three viewings. Self agreement = 95/100 = 95%

You get the basic idea. Measure agreement by number of items, or number of faults in each category, then divide by the maximum counts you could get, either faults, or items. With a bit of thinking, you can make sense of what the various statistics mean and decide which ones are most appropriate for your purpose.

If you really want to go to town on this, generate a matrix of staff agreement on every round of viewing for 3 staff and 3 views (A, B, C = staff, 1, 2, 3 = view number)

A1 1
A2 .96_1
A3 .94_.98_1
B1 .78_.78_......1
B2 .76_.81_...........1
B3 .59_.59_................1
C1 .88_.89.......................1
C2 .91_.96...........................1
C3 .82_.84.................................1
###A1_ A2 A3 B1 B2 B3 C1 C2 C3

This will allow you to see exactly who agrees with whom and by how much.

It's a lot of work to do manually. In my case, I can't get all of the data onto an Excel worksheet. I only have 256 columns and I have over 300 columns of data. (By the time I have finished, 12 staff, 3 viewings, 10 fault categories, 50 items, all data entered by me, all items with removable numbers, randomised for each round of viewing, all calculations created manually in Excel. At the moment I'm having bad dreams about being attacked by fault categories and COUNTIF functions!)

The above method is crude and not based on any statistical theory, but it does summarise concisely. You will find that you need to make a lot of use of the COUNTIF function.

I would post you part of my spreadsheet, but I'm not at work at the moment and in any case, it's a bit of a mess (making it up as you go along, you know the sort of thing).

Hope this helps.

NC
 
Top Bottom