Need help with a Kappa Study (MSA) - Reviewing x-ray Images

A

AdamP

Hi folks,

I'm hoping some of you can point me in the right direction on a particular type of measurement assessment. I've searched the forum with no luck and honestly have not had luck with Google and other web searches to find a paper I know exists which explains what I need.

And that is an extension of Fleiss' Kappa - a measure of agreement between raters often used in attribute MSAs. I put this post here in Six Sigma land since this really is not a Gage R&R sort of thing.

My scenario is: An array of radiologists and clinicians are reviewing x-ray images and each of these may have no errors, or can have 1 or more of 13 categories of errors (nominal data). Each rater sees the same set of sample images and can assign multiple results to each sample. The raters do not have to assign an equal number of errors to the samples.

Cohen's Kappa (Overall agreement between 2 raters) and Fleiss' Kappa (Kappa by category)won't get me there so the attribute agreement option in recent versions of Minitab is not helpful.

Any help is really appreciated!

Cheers,

Adam
 

Bev D

Heretical Statistician
Leader
Super Moderator
Re: Need help with a Kappa study (MSA)

so you are looking for a kappa score for multiple raters and multiple categories?
 
A

AdamP

Re: Need help with a Kappa study (MSA)

Hi Bev,

A bit more than that actually. Multiple raters with multiple categories can be handled by Fleiss' Kappa as long as the categories are independent and the raters assign an equal number of ratings - and only 1 per sample.

I need the case for multiple raters assessing multiple samples (same set of raters), where each rater can assign 1 or more categories per sample and the raters do not need to assign the same equal number of categories to the samples.

So if several radiologists look at a sample of x-ray images, they determine if each image has no errors or some amount of a number of possible error types which are nominal.

Cheers,

Adam
 

Jim Wynne

Leader
Admin
Re: Need help with a Kappa study (MSA)

Hi Bev,

A bit more than that actually. Multiple raters with multiple categories can be handled by Fleiss' Kappa as long as the categories are independent and the raters assign an equal number of ratings - and only 1 per sample.

I need the case for multiple raters assessing multiple samples (same set of raters), where each rater can assign 1 or more categories per sample and the raters do not need to assign the same equal number of categories to the samples.

So if several radiologists look at a sample of x-ray images, they determine if each image has no errors or some amount of a number of possible error types which are nominal.

Cheers,

Adam

I may have missed something, but I'm not sure it's clear what you hope the study will tell you.
 
A

AdamP

Hi Jim,

Nothing radical here - like most attribute cases, we're looking for the degree of agreement between raters. But rather than fabricate an "MSA" that is dependent on the limit of the "tool", we need to assess the actual procedure used, so it's necessary to account for the possibility of raters seeing more than 1 error per sample.

Cheers,

Adam
 

Bev D

Heretical Statistician
Leader
Super Moderator
I'm not sure of the correct statistical analysis - I'd just plot the data out and see where it leads me, before even trying to think about a statistical test. Of course with defect data the plot wouldn't necessarily be a graph but a matrix showing the raters, samples and errors 'detected'.
 

Jim Wynne

Leader
Admin
Hi Jim,

Nothing radical here - like most attribute cases, we're looking for the degree of agreement between raters. But rather than fabricate an "MSA" that is dependent on the limit of the "tool", we need to assess the actual procedure used, so it's necessary to account for the possibility of raters seeing more than 1 error per sample.

Cheers,

Adam

Presumably you're planning on doing something with the data once an analysis is complete, which is what I was asking about and didn't make clear. I guess the correct question is, why are you concerned with degree of agreement between raters?
 
A

AdamP

OK well, concerned about rater agreement for several reasons, though I think we are getting off track.

The overall setting is a university hospital/clinic where you have both seasoned radiologists and newer residents. From a patient perspective, on any given day when you have been scanned, your resultant image might be decisioned by whichever radiologist is on duty to review the image. Having belief that there is a high degree of interrater agreement is something that is important to the patient, the staff and downstream, the insurance providers.

From the scope of the project being worked, ensuring a "first time right" way of working also leads us toward having a high degree of rater agreement to reduce potential rework. In this case rework not only means longer cycle times, but much more importantly can result in exposing patients unnecessarily to more radiation.

From a continuous improvement perspective, having good rater agreement allows for a good pdca style loop between the radiologists reviewing images and the technicians producing the images to the various protocols used.

I was hoping someone here in the Cove had run across this sort of kappa before. I'll keep digging and will share whatever I find.

Cheers,

Adam
 

Jim Wynne

Leader
Admin
OK well, concerned about rater agreement for several reasons, though I think we are getting off track.

The overall setting is a university hospital/clinic where you have both seasoned radiologists and newer residents. From a patient perspective, on any given day when you have been scanned, your resultant image might be decisioned by whichever radiologist is on duty to review the image. Having belief that there is a high degree of interrater agreement is something that is important to the patient, the staff and downstream, the insurance providers.

From the scope of the project being worked, ensuring a "first time right" way of working also leads us toward having a high degree of rater agreement to reduce potential rework. In this case rework not only means longer cycle times, but much more importantly can result in exposing patients unnecessarily to more radiation.

From a continuous improvement perspective, having good rater agreement allows for a good pdca style loop between the radiologists reviewing images and the technicians producing the images to the various protocols used.

I was hoping someone here in the Cove had run across this sort of kappa before. I'll keep digging and will share whatever I find.

Cheers,

Adam

Adam, there is a difference between a purely academic question and a question about something that has practical application. The more that people know about what you're trying to do and why, the better advice you will receive.

As Bev suggested, there is more than one way to skin this particular cat, and if you start with data and examine it graphically, you might find that you've learned what you need to learn, such as that certain types of image defects are more likely to create identification errors than others (just as an example). I think it's probably doubtful that you need to understand agreement out to four decimal places.
 
A

AdamP

Thanks Jim.

I think we're in agreement here. I've spent a good while counseling folks on PGA - practical, graphical, then analytical as a way to approach problem solving and project work. Not looking for the 4 decimal solution certainly but I do think a combination of a graphical approach with some quantification where it makes sense is a good way to go.

In this case I was really looking with specificity, to see if anyone here had experience with this extension of Fleiss' Kappa. Not intending an academic random walk just looking for a specific tool.

Cheers.
 
Top Bottom