Sarmadi AI Digest June 9, 2026 Updated 7:00 AM CT Today Archive Topics Saved Subscribe RSS

Agents' Last Exam, FlashMemory for ultra-long context, and why Muon beats Adam

Three pointed papers do the day's heavy lifting. Agents' Last Exam asks why benchmark wins don't translate to real outcomes and proposes harder, broader tests. FlashMemory-DeepSeek-V4 makes ultra-long-context inference workable with lookahead sparse attention, attacking the KV-cache bottleneck the silicon market repriced last week. Why Muon Outperforms Adam delivers a curvature-perspective explanation for Muon's ~2x training-efficiency gain. SpatialWorld and Skill-3D push interactive spatial benchmarks toward real-world tasks, and Echo-Memory tests memory mechanisms inside action-world models. Press cycle quiet.

14 papers 0 news 1 sources ← Latest

Papers

10 items

Agents saturate, then harden

Agents' Last Exam names the gap between benchmark gains and real-world outcomes; SWE-Explore probes how coding agents actually navigate large repos; SlimSearcher cuts the cost of deep research agents with adaptive reward gating; DuMate-DeepResearch builds an auditable rubric-grounded multi-agent system.

Paper Hugging Face

Agents' Last Exam

Position-and-benchmark paper arguing strong scores on existing agent suites don't translate to real outcomes — proposes harder, broader evaluation.

Why it matters
  • Names the leaderboard-vs-deployment gap that two weeks of agent benchmarks have been edging toward.
  • Likely the next reference benchmark vendors will be graded against.
  • Aligns with RAMP and TASTE — three converging arguments that current agent benchmarks are spent.

Long-context inference and training-theory wins

FlashMemory-DeepSeek-V4 ships lookahead sparse attention for ultra-long context; End-to-End Context Compression at Scale attacks KV-cache growth structurally; Why Muon Outperforms Adam gives a curvature explanation for the optimizer that's been quietly displacing Adam in pretraining.

Spatial benchmarks and action world models

SpatialWorld pushes interactive spatial reasoning into real-world tasks; Skill-3D evolves agentic 3D skills; Latent Spatial Memory and Echo-Memory study memory in action world models; AHA-WAM proposes asynchronous horizon-adaptive world-action modeling.

Also today