Paper Feed
Curated ML papers with one-line takes on why they matter.
News & Events
Recent happenings in AI research.
Claude Code Source Leaked via npm Source Maps
Anthropic's Claude Code CLI source code was inadvertently exposed via npm source maps, revealing 1,884 TypeScript files across 36 folders. The leak exposed internal feature flags, unreleased agent modes (ultraplan, kairos-proactive), and architecture details. Anthropic has since patched the package.
ccleaks.com Analysis β3 Security Flaws in Claude Code Allow Remote Code Execution
Check Point Research identified three vulnerabilities (CVE-2025-59536, CVE-2026-21852) in Claude Code that allow attackers to run arbitrary code and steal API keys via malicious repositories.
Check Point Research βClaude Sonnet 4.6 Released
Anthropic released Claude Sonnet 4.6, delivering frontier performance across coding, agents, and professional work at scale.
Anthropic News βClaude Opus 4.6 Released β Powers Claude Code
Anthropic released Claude Opus 4.6, upgrading their smartest model. Features extended thinking, improved agentic capabilities (coding, computer use, tool use, search, finance). An industry-leading model for agentic tasks.
Anthropic News βDeepSeek-V3 Released β Efficient MoE at Scale
DeepSeek released V3, a 671B mixture-of-experts model trained with novel load-balancing and multi-token prediction. Strong coding and math performance at a fraction of the training cost of comparable models.
DeepSeek Blog β18 papers
Apr 2026
SUPERNOVA: Eliciting General Reasoning in LLMs with Reinforcement Learning on Natural Instructions
Ashima Suvarna et al. Β· 2026-04
Data curation framework for RLVR on natural instructions β generalizes RL-driven reasoning beyond math/code to open-ended everyday tasks.
Dynin-Omni: Omnimodal Unified Large Diffusion Language Model
Dynin AI et al. Β· 2026-04
First masked-diffusion omnimodal model unifying text, image, speech, and video β achieves strong results across 19 multimodal benchmarks.
Fast-dVLM: Efficient Block-Diffusion VLM via Direct Conversion from Autoregressive VLM
Fast-dVLM Team et al. Β· 2026-04
Converts AR VLMs to block-diffusion with KV-cache-compatible parallel decoding β brings diffusion inference speedups to vision-language models.
Learning from the Right Rollouts: Data Attribution for PPO-based LLM Post-Training
I-PPO Team et al. Β· 2026-04
I-PPO uses gradient-based influence scores to filter RL rollouts β reduces unfaithful CoT reasoning and accelerates PPO post-training.
SHAPE: Stage-aware Hierarchical Advantage via Potential Estimation for LLM Reasoning
Zhengyang Ai et al. Β· 2026-04
Formalizes reasoning as state-space trajectories with hierarchical credit assignment β +3% accuracy and 30% fewer tokens by distinguishing efficient breakthroughs from mere verbosity.
DARE: Diffusion Large Language Models Alignment and Reinforcement Executor
Jingyi Yang et al. Β· 2026-04
First unified open framework for post-training diffusion LLMs β brings RLHF and alignment tooling to both masked and block diffusion models, accelerating reproducible research.
LightThinker++: From Reasoning Compression to Memory Management
Yuqi Zhu et al. Β· 2026-04
Upgrades LightThinker with explicit adaptive memory primitives β 70% peak token reduction and 26% faster inference by scheduling purposeful memory actions in long reasoning chains.
Rethinking Token Prediction: Tree-Structured Diffusion Language Model
Zihao Wu et al. Β· 2026-04
Exploits vocabulary hierarchy in masked diffusion β models intermediate latent states as ancestor nodes in a pre-built tree, enabling more structured and efficient training.
Dependency-Guided Parallel Decoding in Discrete Diffusion Language Models
Liran Ringel et al. Β· 2026-04
DEMASK predictor estimates pairwise token dependencies to guide safe parallel unmasking β fixes quality degradation from distributional mismatch in discrete diffusion decoding.
MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens
EverMind AI Β· 2026-04
Scalable sparse attention + document-wise RoPE achieves near-linear complexity at 100M tokens β only 9% quality degradation vs 16K baseline, beating SOTA RAG and memory agents.
May 2025
Fast-dLLM: Training-free Acceleration of Diffusion LLM
Wu et al. Β· 2025-05
Practical 3-10x speedup for masked diffusion LMs with no retraining β makes diffusion LMs viable for production.
Feb 2025
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
Arriola et al. Β· 2025-02
Elegant bridge between AR and diffusion β block size B lets you tune the speed/quality tradeoff continuously.
Large Language Diffusion Models (LLaDA)
Nie et al. Β· 2025-02
First diffusion LM at 8B scale that matches LLaMA3 β proves AR is not the only path to powerful LLMs.
Jan 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via RL
DeepSeek AI Β· 2025-01
Matches o1 on math/code using pure RL with GRPO β no supervised CoT data needed. Huge open-source release.
Jun 2024
Simple and Effective Masked Diffusion Language Models (MDLM)
Sahoo et al. Β· 2024-06
Clean theoretical foundation for masked diffusion β derives the training loss from first principles, no hand-tuning.
Feb 2024
DeepSeekMath: Pushing the Limits of Mathematical Reasoning (GRPO)
Shao et al. Β· 2024-02
Introduces GRPO β removes the critic network from PPO by using group-relative rewards. Powers DeepSeek-R1.
May 2023
Direct Preference Optimization (DPO)
Rafailov et al. Β· 2023-05
Killed the reward model β rewrites RLHF as a simple binary loss. Became the default fine-tuning method for open-source LLMs.
Jun 2017
Attention Is All You Need
Vaswani et al. Β· 2017-06
The paper that started it all β replaced RNNs with attention. Everything in this repo builds on it.