Inference economics moves to center stage
Monday's research and industry signals converge on one question: who pays for inference, and where does it run. Stratechery names the shift explicitly; CUDA's renewed framing as Nvidia's software moat and Nvidia's $40B in equity deals price it; a top-of-Hacker-News essay argues local AI should be the default. On the research side, three independent papers attack long-context inference cost from different layers — shallow-prefill KV visibility, block-iterative speculative decoding, and an analysis of state-tracking error control in linear models. Agent safety quietly graduated from manifesto to tooling: a red-team platform, a prefix-trace failure monitor, a skill compiler with built-in injection checks, and an SAE-based firewall for VLMs all arrived on the same day. A separate result reframes RL for reasoning as sparse policy selection rather than capability acquisition, recovering most of RL's gains at three orders of magnitude lower cost.