Paper Feed

Curated ML papers with one-line takes on why they matter.

News & Events

Recent happenings in AI research.

2026-04-23ReleaseLATEST

DeepSeek-V4 Officially Open-Sourced β€” 1.6T MoE, 1M Context

DeepSeek-V4 is officially live and open-sourced. V4-Pro: 1.6T total / 49B active params, 1M context, 80.6% SWE Verified, 93.5 LiveCodeBench, Codeforces rating 3206. V4-Flash: 284B total / 13B active, same 1M context. Both support Expert Mode (Think) and Instant Mode (Non-Think). Pre-trained on 32T+ tokens with Muon optimizer. Weights on HuggingFace, API available, tech report released.

DeepSeek Official β†—
2026-04-21Release

GPT Image 2.0 β€” Reasoning-First Image Generation

OpenAI launched gpt-image-2 (ChatGPT Images 2.0) with integrated o-series reasoning β€” the model plans and reasons before generating. Supports up to 8 coherent images per prompt, accurate multilingual text rendering (Chinese, Japanese, Korean), and 2K resolution. Hit #1 on Image Arena within 12 hours by a +242 point margin.

OpenAI Blog β†—
2026-04-04Industry

Claude Code Source Leaked via npm Source Maps

Anthropic's Claude Code CLI source code was inadvertently exposed via npm source maps, revealing 1,884 TypeScript files across 36 folders. The leak exposed internal feature flags, unreleased agent modes (ultraplan, kairos-proactive), and architecture details. Anthropic has since patched the package.

ccleaks.com Analysis β†—
2026-04-03Industry

3 Security Flaws in Claude Code Allow Remote Code Execution

Check Point Research identified three vulnerabilities (CVE-2025-59536, CVE-2026-21852) in Claude Code that allow attackers to run arbitrary code and steal API keys via malicious repositories.

Check Point Research β†—
2026-02-17Release

Claude Sonnet 4.6 Released

Anthropic released Claude Sonnet 4.6, delivering frontier performance across coding, agents, and professional work at scale.

Anthropic News β†—

35 papers

Apr 2026

⭐

DeepSeek-V4 Technical Report

DeepSeek AI Β· 2026-04

1.6T MoE (49B active), 1M context, Muon optimizer β€” 80.6% SWE Verified, 93.5 LiveCodeBench, Codeforces 3206. Open weights, MIT license.

⭐

Introspective Diffusion Language Models

Yifan Yu et al. Β· 2026-04

Identifies introspective inconsistency as the root quality gap in diffusion LMs; ISD verifies past tokens while advancing new ones in one forward pass.

Β·

Accelerating Speculative Decoding with Block Diffusion Draft Trees

Liran Ringel and Yaniv Romano Β· 2026-04

DDTree builds a best-first draft tree from block diffusion per-position distributions β€” SOTA speculative decoding verified in one target model forward pass.

Β·

Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

NVIDIA et al. Β· 2026-04

NVIDIA's 120B/12B-active MoE hybrid Mamba-Transformer trained on 25T tokens β€” SOTA agentic benchmarks via native speculative decoding.

Β·

Thinking Fast, Thinking Wrong: Intuitiveness Modulates LLM Counterfactual Reasoning in Policy Evaluation

Yanjie He et al. Β· 2026-04

CoT paradox: chain-of-thought helps obvious cases but nearly fails counter-intuitive ones β€” intuitiveness dominates LLM reasoning reliability.

⭐

SPPO: Sequence-Level PPO for Long-Horizon Reasoning Tasks

Tianyi Wang et al. Β· 2026-04

Reformulates PPO as sequence-level contextual bandit β€” no value model needed, matches GRPO-style methods with far lower memory cost.

Β·

RAGEN-2: Reasoning Collapse in Agentic RL

Zihan Wang et al. Β· 2026-04

Discovers template collapse β€” a failure mode invisible to entropy in agentic RL β€” fixed with SNR-aware prompt filtering.

Β·

DFlash: Block Diffusion for Flash Speculative Decoding

Jian Chen et al. Β· 2026-04

Block diffusion drafter achieves 6x lossless speedup over base LLM β€” 2.5x faster than EAGLE-3 with no quality loss.

Β·

DFlash: Block Diffusion for Flash Speculative Decoding

Jian Chen et al. Β· 2026-04

Block diffusion as speculative draft model achieves 6x LLM inference speedup β€” bridges diffusion language models and fast autoregressive deployment.

Β·

Fast-dVLM: Efficient Block-Diffusion VLM via Direct Conversion from Autoregressive VLM

Chengyue Wu et al. Β· 2026-04

Direct AR-to-diffusion VLM conversion with KV-cache-compatible parallel decoding β€” matches AR quality across 11 multimodal benchmarks at lower inference cost.

Β·

SUPERNOVA: Eliciting General Reasoning in LLMs with Reinforcement Learning on Natural Instructions

Ashima Suvarna et al. Β· 2026-04

Mines natural instruction datasets for verifiable rewards β€” extends RLVR beyond math/code to causal, temporal, and abductive reasoning without hand-crafted reward functions.

⭐

TriAttention: Efficient Long Reasoning with Trigonometric KV Compression

Weian Mao et al. Β· 2026-04

Trigonometric KV compression matches full-attention accuracy at 2.5x throughput or 10.7x memory reduction β€” practical long-reasoning speedup from MIT/NVIDIA.

Β·

Multi-objective Evolutionary Merging Enables Efficient Reasoning Models

Mario Iacobelli et al. Β· 2026-04

Evolutionary merging of reasoning and base models eliminates overthinking β€” cuts inference cost on easy problems without sacrificing accuracy.

⭐

OptiMer: Optimal Distribution Vector Merging Is Better than Data Mixing for Continual Pre-Training

Haiyue Song et al. Β· 2026-04

Trains one model per dataset then merges distribution vectors with Bayesian-optimized weights β€” beats data-mixing CPT on Gemma 3 27B with 15–35x lower search cost.

Β·

DyMoE: Dynamic Expert Orchestration with Mixed-Precision Quantization for Efficient MoE Inference on Edge

Yuegui Huang et al. Β· 2026-04

Importance-aware dynamic quantization + look-ahead prefetching for MoE on commodity edge hardware β€” 3.4–22.7x TTFT speedup without accuracy loss.

Β·

Verify Before You Commit: Towards Faithful Reasoning in LLM Agents via Self-Auditing

Wenhao Yuan et al. Β· 2026-04

SAVeR adversarially audits internal belief states before action commitment β€” stops unsupported beliefs from cascading across memory and decision steps in long-horizon agents. ACL 2026.

Β·

Single-Agent LLMs Outperform Multi-Agent Systems on Multi-Hop Reasoning Under Equal Thinking Token Budgets

Dat Tran et al. Β· 2026-04

Information-theoretic proof via Data Processing Inequality: single-agent LLMs are more token-efficient on multi-hop reasoning β€” reported MAS gains trace to uncontrolled compute.

⭐

SUPERNOVA: Eliciting General Reasoning in LLMs with Reinforcement Learning on Natural Instructions

Ashima Suvarna et al. Β· 2026-04

Data curation framework for RLVR on natural instructions β€” generalizes RL-driven reasoning beyond math/code to open-ended everyday tasks.

Β·

Dynin-Omni: Omnimodal Unified Large Diffusion Language Model

Dynin AI et al. Β· 2026-04

First masked-diffusion omnimodal model unifying text, image, speech, and video β€” achieves strong results across 19 multimodal benchmarks.

Β·

Fast-dVLM: Efficient Block-Diffusion VLM via Direct Conversion from Autoregressive VLM

Fast-dVLM Team et al. Β· 2026-04

Converts AR VLMs to block-diffusion with KV-cache-compatible parallel decoding β€” brings diffusion inference speedups to vision-language models.