Best-Case Kappa Scores Calculated Retrospectively From EEG Report Databases

01 June 2013

New Image

Purpose: The most popular metric for interrater reliability in electroencephalography is the kappa (kappa) score. kappa calculation is laborious, requiring EEG readers to read the same EEG studies. We introduce a method to determine the best-case kappa score (kappa(BEST)) for measuring interrater reliability between EEG readers, retrospectively. Methods: We incorporated 1 year of EEG reports read by four adult EEG readers at our institution. We used SQL queries to determine EEG findings for subsequent analysis. We generated logistic regression models for particular EEG findings, dependent on patient age, location acuity, and EEG reader. We derived a novel measure, the kappa(BEST) statistic, from the logistic regression coefficients. Results: Increasing patient age and location acuity were associated with decreased sleep and increased diffuse abnormalities. For certain findings, EEG readers exhibited the dominant influence, manifesting directly as lower between-reader kappa(BEST) scores for certain EEG findings. Within-reader kappa(BEST) control scores were higher than between-reader scores, suggesting internal consistency. Conclusions: The kappa(BEST) metric can measure significant interrater reliability differences between any number of EEG readers and reports, retrospectively, and is generalizable to other domains (e.g., pathology or radiology reporting). We suggest using this metric as a guide or starting point for focused quality control efforts.