Paper Feed
Curated ML papers with one-line takes on why they matter.
News & Events
Recent happenings in AI research.
DeepSeek-V4 Officially Open-Sourced β 1.6T MoE, 1M Context
DeepSeek-V4 is officially live and open-sourced. V4-Pro: 1.6T total / 49B active params, 1M context, 80.6% SWE Verified, 93.5 LiveCodeBench, Codeforces rating 3206. V4-Flash: 284B total / 13B active, same 1M context. Both support Expert Mode (Think) and Instant Mode (Non-Think). Pre-trained on 32T+ tokens with Muon optimizer. Weights on HuggingFace, API available, tech report released.
DeepSeek Official βGPT Image 2.0 β Reasoning-First Image Generation
OpenAI launched gpt-image-2 (ChatGPT Images 2.0) with integrated o-series reasoning β the model plans and reasons before generating. Supports up to 8 coherent images per prompt, accurate multilingual text rendering (Chinese, Japanese, Korean), and 2K resolution. Hit #1 on Image Arena within 12 hours by a +242 point margin.
OpenAI Blog βClaude Code Source Leaked via npm Source Maps
Anthropic's Claude Code CLI source code was inadvertently exposed via npm source maps, revealing 1,884 TypeScript files across 36 folders. The leak exposed internal feature flags, unreleased agent modes (ultraplan, kairos-proactive), and architecture details. Anthropic has since patched the package.
ccleaks.com Analysis β3 Security Flaws in Claude Code Allow Remote Code Execution
Check Point Research identified three vulnerabilities (CVE-2025-59536, CVE-2026-21852) in Claude Code that allow attackers to run arbitrary code and steal API keys via malicious repositories.
Check Point Research βClaude Sonnet 4.6 Released
Anthropic released Claude Sonnet 4.6, delivering frontier performance across coding, agents, and professional work at scale.
Anthropic News β35 papers
Apr 2026
DeepSeek-V4 Technical Report
DeepSeek AI Β· 2026-04
1.6T MoE (49B active), 1M context, Muon optimizer β 80.6% SWE Verified, 93.5 LiveCodeBench, Codeforces 3206. Open weights, MIT license.
Introspective Diffusion Language Models
Yifan Yu et al. Β· 2026-04
Identifies introspective inconsistency as the root quality gap in diffusion LMs; ISD verifies past tokens while advancing new ones in one forward pass.
Accelerating Speculative Decoding with Block Diffusion Draft Trees
Liran Ringel and Yaniv Romano Β· 2026-04
DDTree builds a best-first draft tree from block diffusion per-position distributions β SOTA speculative decoding verified in one target model forward pass.
Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning
NVIDIA et al. Β· 2026-04
NVIDIA's 120B/12B-active MoE hybrid Mamba-Transformer trained on 25T tokens β SOTA agentic benchmarks via native speculative decoding.
Thinking Fast, Thinking Wrong: Intuitiveness Modulates LLM Counterfactual Reasoning in Policy Evaluation
Yanjie He et al. Β· 2026-04
CoT paradox: chain-of-thought helps obvious cases but nearly fails counter-intuitive ones β intuitiveness dominates LLM reasoning reliability.
SPPO: Sequence-Level PPO for Long-Horizon Reasoning Tasks
Tianyi Wang et al. Β· 2026-04
Reformulates PPO as sequence-level contextual bandit β no value model needed, matches GRPO-style methods with far lower memory cost.
RAGEN-2: Reasoning Collapse in Agentic RL
Zihan Wang et al. Β· 2026-04
Discovers template collapse β a failure mode invisible to entropy in agentic RL β fixed with SNR-aware prompt filtering.
DFlash: Block Diffusion for Flash Speculative Decoding
Jian Chen et al. Β· 2026-04
Block diffusion drafter achieves 6x lossless speedup over base LLM β 2.5x faster than EAGLE-3 with no quality loss.
DFlash: Block Diffusion for Flash Speculative Decoding
Jian Chen et al. Β· 2026-04
Block diffusion as speculative draft model achieves 6x LLM inference speedup β bridges diffusion language models and fast autoregressive deployment.
Fast-dVLM: Efficient Block-Diffusion VLM via Direct Conversion from Autoregressive VLM
Chengyue Wu et al. Β· 2026-04
Direct AR-to-diffusion VLM conversion with KV-cache-compatible parallel decoding β matches AR quality across 11 multimodal benchmarks at lower inference cost.
SUPERNOVA: Eliciting General Reasoning in LLMs with Reinforcement Learning on Natural Instructions
Ashima Suvarna et al. Β· 2026-04
Mines natural instruction datasets for verifiable rewards β extends RLVR beyond math/code to causal, temporal, and abductive reasoning without hand-crafted reward functions.
TriAttention: Efficient Long Reasoning with Trigonometric KV Compression
Weian Mao et al. Β· 2026-04
Trigonometric KV compression matches full-attention accuracy at 2.5x throughput or 10.7x memory reduction β practical long-reasoning speedup from MIT/NVIDIA.
Multi-objective Evolutionary Merging Enables Efficient Reasoning Models
Mario Iacobelli et al. Β· 2026-04
Evolutionary merging of reasoning and base models eliminates overthinking β cuts inference cost on easy problems without sacrificing accuracy.
OptiMer: Optimal Distribution Vector Merging Is Better than Data Mixing for Continual Pre-Training
Haiyue Song et al. Β· 2026-04
Trains one model per dataset then merges distribution vectors with Bayesian-optimized weights β beats data-mixing CPT on Gemma 3 27B with 15β35x lower search cost.
DyMoE: Dynamic Expert Orchestration with Mixed-Precision Quantization for Efficient MoE Inference on Edge
Yuegui Huang et al. Β· 2026-04
Importance-aware dynamic quantization + look-ahead prefetching for MoE on commodity edge hardware β 3.4β22.7x TTFT speedup without accuracy loss.
Verify Before You Commit: Towards Faithful Reasoning in LLM Agents via Self-Auditing
Wenhao Yuan et al. Β· 2026-04
SAVeR adversarially audits internal belief states before action commitment β stops unsupported beliefs from cascading across memory and decision steps in long-horizon agents. ACL 2026.
Single-Agent LLMs Outperform Multi-Agent Systems on Multi-Hop Reasoning Under Equal Thinking Token Budgets
Dat Tran et al. Β· 2026-04
Information-theoretic proof via Data Processing Inequality: single-agent LLMs are more token-efficient on multi-hop reasoning β reported MAS gains trace to uncontrolled compute.
SUPERNOVA: Eliciting General Reasoning in LLMs with Reinforcement Learning on Natural Instructions
Ashima Suvarna et al. Β· 2026-04
Data curation framework for RLVR on natural instructions β generalizes RL-driven reasoning beyond math/code to open-ended everyday tasks.
Dynin-Omni: Omnimodal Unified Large Diffusion Language Model
Dynin AI et al. Β· 2026-04
First masked-diffusion omnimodal model unifying text, image, speech, and video β achieves strong results across 19 multimodal benchmarks.
Fast-dVLM: Efficient Block-Diffusion VLM via Direct Conversion from Autoregressive VLM
Fast-dVLM Team et al. Β· 2026-04
Converts AR VLMs to block-diffusion with KV-cache-compatible parallel decoding β brings diffusion inference speedups to vision-language models.