Sarmadi AI Digest June 8, 2026 Updated 7:00 AM CT Today Archive Topics Saved Subscribe RSS

Agent reliability under noise; open-world self-evolution

A research-led Monday after a quiet weekend. The agent-reliability thread sharpened: When Tools Fail benchmarks dynamic replanning and anomaly recovery, ResearchClawBench evaluates end-to-end autonomous research, SubtleMemory tests fine-grained relational memory in long-horizon agents, and OpenSkill proposes open-world self-evolution for LLM agents past the usable-learning-loop assumption. Spatial reasoning came back into focus with Stream3D-VLM and Imaginative Perception Tokens. Robots Need More than VLA and World Models argues against the policy-scaling-is-enough framing for generalist robot intelligence. No press cycle on the day.

14 papers 0 news 1 sources ← Latest

Papers

9 items

Agent reliability under noise

Four benchmarks tighten what production-grade agent reliability means: dynamic replanning when tools fail, end-to-end autonomous scientific research, fine-grained relational memory over long horizons, and open-world self-evolution where the learning loop isn't guaranteed.

Spatial reasoning pushes back

Stream3D-VLM brings 3D understanding online with incremental geometry priors, Imaginative Perception Tokens enhance VLM spatial reasoning, and Thinking with Imagination uses world simulators for agentic visual spatial reasoning. The week's earlier 'VLMs don't actually know spatial' critique is being answered.

Robots need more than VLA + world models

A position paper argues generalist robot intelligence is not just a policy-scaling problem; AnchorWorld extends interactive world modeling with view-based evolution; Direct 3D-Aware Object Insertion ships a clean compositing tool.

Also today