Paper Feed

Curated ML papers with one-line takes on why they matter.

News & Events

Recent happenings in AI research.

2026-04-04IndustryLATEST

Claude Code Source Leaked via npm Source Maps

Anthropic's Claude Code CLI source code was inadvertently exposed via npm source maps, revealing 1,884 TypeScript files across 36 folders. The leak exposed internal feature flags, unreleased agent modes (ultraplan, kairos-proactive), and architecture details. Anthropic has since patched the package.

ccleaks.com Analysis β†—
2026-04-03Industry

3 Security Flaws in Claude Code Allow Remote Code Execution

Check Point Research identified three vulnerabilities (CVE-2025-59536, CVE-2026-21852) in Claude Code that allow attackers to run arbitrary code and steal API keys via malicious repositories.

Check Point Research β†—
2026-02-17Release

Claude Sonnet 4.6 Released

Anthropic released Claude Sonnet 4.6, delivering frontier performance across coding, agents, and professional work at scale.

Anthropic News β†—
2026-02-05Release

Claude Opus 4.6 Released β€” Powers Claude Code

Anthropic released Claude Opus 4.6, upgrading their smartest model. Features extended thinking, improved agentic capabilities (coding, computer use, tool use, search, finance). An industry-leading model for agentic tasks.

Anthropic News β†—
2026-01-26Release

DeepSeek-V3 Released β€” Efficient MoE at Scale

DeepSeek released V3, a 671B mixture-of-experts model trained with novel load-balancing and multi-token prediction. Strong coding and math performance at a fraction of the training cost of comparable models.

DeepSeek Blog β†—

18 papers

Apr 2026

⭐

SUPERNOVA: Eliciting General Reasoning in LLMs with Reinforcement Learning on Natural Instructions

Ashima Suvarna et al. Β· 2026-04

Data curation framework for RLVR on natural instructions β€” generalizes RL-driven reasoning beyond math/code to open-ended everyday tasks.

Β·

Dynin-Omni: Omnimodal Unified Large Diffusion Language Model

Dynin AI et al. Β· 2026-04

First masked-diffusion omnimodal model unifying text, image, speech, and video β€” achieves strong results across 19 multimodal benchmarks.

Β·

Fast-dVLM: Efficient Block-Diffusion VLM via Direct Conversion from Autoregressive VLM

Fast-dVLM Team et al. Β· 2026-04

Converts AR VLMs to block-diffusion with KV-cache-compatible parallel decoding β€” brings diffusion inference speedups to vision-language models.

Β·

Learning from the Right Rollouts: Data Attribution for PPO-based LLM Post-Training

I-PPO Team et al. Β· 2026-04

I-PPO uses gradient-based influence scores to filter RL rollouts β€” reduces unfaithful CoT reasoning and accelerates PPO post-training.

⭐

SHAPE: Stage-aware Hierarchical Advantage via Potential Estimation for LLM Reasoning

Zhengyang Ai et al. Β· 2026-04

Formalizes reasoning as state-space trajectories with hierarchical credit assignment β€” +3% accuracy and 30% fewer tokens by distinguishing efficient breakthroughs from mere verbosity.

⭐

DARE: Diffusion Large Language Models Alignment and Reinforcement Executor

Jingyi Yang et al. Β· 2026-04

First unified open framework for post-training diffusion LLMs β€” brings RLHF and alignment tooling to both masked and block diffusion models, accelerating reproducible research.

⭐

LightThinker++: From Reasoning Compression to Memory Management

Yuqi Zhu et al. Β· 2026-04

Upgrades LightThinker with explicit adaptive memory primitives β€” 70% peak token reduction and 26% faster inference by scheduling purposeful memory actions in long reasoning chains.

Β·

Rethinking Token Prediction: Tree-Structured Diffusion Language Model

Zihao Wu et al. Β· 2026-04

Exploits vocabulary hierarchy in masked diffusion β€” models intermediate latent states as ancestor nodes in a pre-built tree, enabling more structured and efficient training.

Β·

Dependency-Guided Parallel Decoding in Discrete Diffusion Language Models

Liran Ringel et al. Β· 2026-04

DEMASK predictor estimates pairwise token dependencies to guide safe parallel unmasking β€” fixes quality degradation from distributional mismatch in discrete diffusion decoding.

Β·

MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens

EverMind AI Β· 2026-04

Scalable sparse attention + document-wise RoPE achieves near-linear complexity at 100M tokens β€” only 9% quality degradation vs 16K baseline, beating SOTA RAG and memory agents.

May 2025

⭐

Fast-dLLM: Training-free Acceleration of Diffusion LLM

Wu et al. Β· 2025-05

Practical 3-10x speedup for masked diffusion LMs with no retraining β€” makes diffusion LMs viable for production.

Feb 2025

⭐

Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

Arriola et al. Β· 2025-02

Elegant bridge between AR and diffusion β€” block size B lets you tune the speed/quality tradeoff continuously.

⭐

Large Language Diffusion Models (LLaDA)

Nie et al. Β· 2025-02

First diffusion LM at 8B scale that matches LLaMA3 β€” proves AR is not the only path to powerful LLMs.

Jan 2025

⭐

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via RL

DeepSeek AI Β· 2025-01

Matches o1 on math/code using pure RL with GRPO β€” no supervised CoT data needed. Huge open-source release.

Jun 2024

Β·

Simple and Effective Masked Diffusion Language Models (MDLM)

Sahoo et al. Β· 2024-06

Clean theoretical foundation for masked diffusion β€” derives the training loss from first principles, no hand-tuning.

Feb 2024

Β·

DeepSeekMath: Pushing the Limits of Mathematical Reasoning (GRPO)

Shao et al. Β· 2024-02

Introduces GRPO β€” removes the critic network from PPO by using group-relative rewards. Powers DeepSeek-R1.

May 2023

⭐

Direct Preference Optimization (DPO)

Rafailov et al. Β· 2023-05

Killed the reward model β€” rewrites RLHF as a simple binary loss. Became the default fine-tuning method for open-source LLMs.

Jun 2017

Β·

Attention Is All You Need

Vaswani et al. Β· 2017-06

The paper that started it all β€” replaced RNNs with attention. Everything in this repo builds on it.