AI Model Timeline

Key model releases from Transformer (2017) to today β€” filterable by category (LLMs, DLLMs, VLMs, Agents) and organization. β˜… = PaperTrace deep-dive available. πŸ”“ = open weights.

2026
Claude Opus 4.6β˜…Anthropic

Extended thinking, improved agentic capabilities β€” powers Claude Code. Released Feb 5, 2026.

02/2026
Claude Sonnet 4.6β˜…Anthropic

Frontier performance across coding, agents, and professional work. Released Feb 17, 2026.

02/2026
Gemini 2.0 Ultraβ˜…Google

Native multimodal reasoning + agentic tool use β€” Google's strongest model at launch

02/2026
DeepSeek-V3β˜…DeepSeek

Efficient MoE with novel load-balancing and multi-token prediction β€” strong coding and math at low training cost

2026
2025
Qwen3β˜…Alibaba235B MoE

Thinking mode toggle β€” top open-source reasoning model

04/2025
Gemini 2.5 Proβ˜…Google

Tops most benchmarks + Chatbot Arena β€” 1M token context

03/2025
DeepSeek-R1β˜…DeepSeek671B MoE

o1-level reasoning via RL, fully open-weights MIT license β€” shocked markets

2025
Kimi k1.5Moonshot AI

Long-context RL reasoning model β€” competitive with o1 on math/coding

2025
2024
Claude 3.5 Sonnetβ˜…Anthropic

Best coding model at launch β€” outperforms GPT-4o on most benchmarks

06/2024
Qwen2Alibaba72B

Strong multilingual model with excellent Chinese performance

05/2024
LLaMA 3β˜…Meta70B / 405B

Best open-weight model β€” competitive with GPT-4 Turbo on most tasks

04/2024
GRPO (DeepSeek-Math)DeepSeek

Group Relative Policy Optimization β€” simpler PPO without critic network

2024
2023
Gemini 1.0Google

Native multimodal β€” first credible GPT-4 competitor from Google

12/2023
Mistral 7BMistral AI7B

GQA + sliding window attention β€” beats LLaMA 2 13B at 7B scale

09/2023
LLaMA 2Meta70B

Commercial license, chat fine-tuned, widely adopted open-weight model

07/2023
DPOβ˜…Stanford

Direct Preference Optimization β€” eliminates reward model and PPO from RLHF

05/2023
GPT-4β˜…OpenAI

Multimodal, passes bar exam, massive capability leap over GPT-3.5

03/2023
LLaMA 1β˜…Meta65B

Open-weights LLM that outperforms GPT-3 β€” enables community research

02/2023
2022
ChatGPTβ˜…OpenAI

100M users in 2 months β€” AI enters mainstream consciousness

11/2022
InstructGPTβ˜…OpenAI175B

RLHF alignment via PPO on human preferences β€” precursor to ChatGPT

03/2022
2021
CodexOpenAI12B

GPT fine-tuned on GitHub code β€” powers GitHub Copilot

08/2021
2020
GPT-3β˜…OpenAI175B

Few-shot in-context learning emerges at scale β€” no gradient update needed

05/2020
2019
GPT-2β˜…OpenAI1.5B

Zero-shot multitask LM β€” "too dangerous to release" at launch

02/2019
2018
BERTβ˜…Google340M

Bidirectional Transformer pre-training with MLM β€” GLUE SOTA across all tasks

10/2018
2017
Transformerβ˜…Google

Attention Is All You Need β€” self-attention replaces RNNs entirely

06/2017

Dates are approximate. Parameters are estimates where not officially confirmed.