LLaDA: Large Language Diffusion with mAsking
Nie et al. Β· arXiv 2025
A masked diffusion framework for LLMs. Uses progressive masking as the forward process and learns to predict masked tokens in reverse, matching AR models at 8B scale.
Formulas broken down step by step. Walk-through examples with real numbers. Interactive visualizations you can poke at. No hand-waving.
Nie et al. Β· arXiv 2025
A masked diffusion framework for LLMs. Uses progressive masking as the forward process and learns to predict masked tokens in reverse, matching AR models at 8B scale.
Sahoo et al. Β· NeurIPS 2024
Simplifies masked discrete diffusion with a principled continuous-time ELBO. Clean, minimal design with strong perplexity results.
Shi et al. Β· arXiv 2025
Accelerates discrete diffusion LMs with adaptive noise schedules and importance sampling, reducing denoising steps by 3-10x.
Arriola et al. Β· arXiv 2025
Generates text in blocks β blocks go left-to-right (AR), tokens within each block are denoised in parallel (diffusion). Best of both worlds.