Ordinal Data Gage Study - Kappa scores vs. Kendell?s Coefficient



Ordinal Data Gage Study - Kappa scores vs. Kendell’s Coefficient

We are having an issues understanding the output from a recent gage study we completed.
Gage System Background: We have a process in our operation in which inspectors rate units on a scale of 1 to 5 (one is “best” and five is “worst”). In order to verify adequate training we wanted to “test” two new inspectors using the ordered attribute agreement analysis.
The testing process included:
1. Selected ten parts.
2. Each part was rated by the manager of the department to determine the “standard” value
3. Then each inspector was called in to rate the ten parts
4. This process was repeated one time
The information was put into mini-tab for analysis
Our issue:
The confusion comes from the kappa scores (for within appraisers, between appraisers and to standard) compared to the Kendell’s Coefficient result. The kappa scores on some of the samples / appraisers are negative. As I understand, this indicates that the measurement system is worse than random chance. However, the Kendell’s scores are 0.9 or better. This result suggesting that the measurement system is excellent.
Please help
This is my first post, so please forgive me if I have done something incorrect.


  • Ordinal Data.xlsx
    9.1 KB · Views: 190


Forum Moderator
Re: Ordinal Data Gage Study - Kappa scores vs. Kendell’s Coefficient

Kappa and Kendall's coefficient measure two different aspects of the measurement system.

Kappa measures how well everyone correctly identifies the exact standard value. If they are one category off high or low, they get zero credit.

Kendall's coefficient measures the degree of correlation between everyone and the standard. In this case, if the standard was 4 and everyone rated it as 5, they get credit for being close. They get no credit with kappa.

Bottom line, your inspectors correlate well with the standard. That is, if the standard is high, they rate it high. If it is low, they rate it low. On the other hand, they apparently do not repeat well. In other words, they still miss the standard by one or two categories.
Top Bottom