AI Model Timeline

Key model releases from Transformer (2017) to today — filterable by category (LLMs, DLLMs, VLMs, Agents) and organization. ★ = PaperTrace deep-dive available. 🔓 = open weights.

2026

Claude Opus 4.6★Anthropic

Extended thinking, improved agentic capabilities — powers Claude Code. Released Feb 5, 2026.

Blog ↗

02/2026

Claude Sonnet 4.6★Anthropic

Frontier performance across coding, agents, and professional work. Released Feb 17, 2026.

Blog ↗

02/2026

Gemini 2.0 Ultra★Google

Native multimodal reasoning + agentic tool use — Google's strongest model at launch

Blog ↗

02/2026

DeepSeek-V3★DeepSeek

Efficient MoE with novel load-balancing and multi-token prediction — strong coding and math at low training cost

Blog ↗

2026

2025

Qwen3★Alibaba235B MoE

Thinking mode toggle — top open-source reasoning model

Blog ↗

04/2025

Gemini 2.5 Pro★Google

Tops most benchmarks + Chatbot Arena — 1M token context

Blog ↗

03/2025

DeepSeek-R1★DeepSeek671B MoE

o1-level reasoning via RL, fully open-weights MIT license — shocked markets

★ Deep Dive arXiv ↗

2025

Kimi k1.5Moonshot AI

Long-context RL reasoning model — competitive with o1 on math/coding

arXiv ↗

2025

2024

Claude 3.5 Sonnet★Anthropic

Best coding model at launch — outperforms GPT-4o on most benchmarks

Blog ↗

06/2024

Qwen2Alibaba72B

Strong multilingual model with excellent Chinese performance

arXiv ↗

05/2024

LLaMA 3★Meta70B / 405B

Best open-weight model — competitive with GPT-4 Turbo on most tasks

Blog ↗

04/2024

GRPO (DeepSeek-Math)DeepSeek

Group Relative Policy Optimization — simpler PPO without critic network

★ Deep Dive arXiv ↗

2024

2023

Gemini 1.0Google

Native multimodal — first credible GPT-4 competitor from Google

Blog ↗

12/2023

Mistral 7BMistral AI7B

GQA + sliding window attention — beats LLaMA 2 13B at 7B scale

arXiv ↗

09/2023

LLaMA 2Meta70B

Commercial license, chat fine-tuned, widely adopted open-weight model

arXiv ↗

07/2023

DPO★Stanford

Direct Preference Optimization — eliminates reward model and PPO from RLHF

★ Deep Dive arXiv ↗

05/2023

GPT-4★OpenAI

Multimodal, passes bar exam, massive capability leap over GPT-3.5

arXiv ↗

03/2023

LLaMA 1★Meta65B

Open-weights LLM that outperforms GPT-3 — enables community research

arXiv ↗

02/2023

2022

ChatGPT★OpenAI

100M users in 2 months — AI enters mainstream consciousness

Blog ↗

11/2022

InstructGPT★OpenAI175B

RLHF alignment via PPO on human preferences — precursor to ChatGPT

★ Deep Dive arXiv ↗

03/2022

2021

CodexOpenAI12B

GPT fine-tuned on GitHub code — powers GitHub Copilot

arXiv ↗

08/2021

2020

GPT-3★OpenAI175B

Few-shot in-context learning emerges at scale — no gradient update needed

arXiv ↗

05/2020

2019

GPT-2★OpenAI1.5B

Zero-shot multitask LM — "too dangerous to release" at launch

★ Deep Dive Blog ↗

02/2019

2018

BERT★Google340M

Bidirectional Transformer pre-training with MLM — GLUE SOTA across all tasks

★ Deep Dive arXiv ↗

10/2018

2017

Transformer★Google

Attention Is All You Need — self-attention replaces RNNs entirely

★ Deep Dive arXiv ↗

06/2017

Dates are approximate. Parameters are estimates where not officially confirmed.