Sarmadi AI Digest May 15, 2026 Updated 6:50 AM CT Today Archive Topics Saved Subscribe RSS

Cerebras IPO opens 2026; Microsoft drops Claude Code

Cerebras priced at $5.5B and popped 108%, opening the 2026 tech IPO window on the back of the AI compute trade. The same week, Microsoft began canceling internal Claude Code licenses, OpenAI shipped Codex on phones, and Anthropic released Claude for Legal — the platform realignment is being executed at the developer-tool layer in plain sight. Public opinion on the physical side has flipped sharply: a Gallup survey shows 70% of Americans oppose data center construction in their area, an Oregon utility cut residents off to keep DCs running, and the Verge published an interactive backyard-DC map. Long-horizon agent research caught up with reality this week — WildClawBench replaces sandbox evals with realistic CLI work, STALE measures whether agents notice their memories are stale, and LiSA proposes lifelong safety adaptation; an Ontario audit of doctor AI notetakers showed exactly the kind of fabrications these benchmarks predict. The OpenAI trial closed in San Francisco; the jury now decides who controls the founding myth.

11 papers 18 news 9 sources ← Latest

News

17 items

Cerebras opens the 2026 IPO window

Cerebras' $5.5B raise and 108% first-day pop reset expectations for the year's tech IPO calendar — the AI compute trade is the only ticket the public market is currently buying. Cisco's near-4,000-job restructuring and a $10M bet on a serial founder for autonomous bookkeeping show the same capital cycle reshaping incumbents and seed-stage alike.

News TechCrunch AI

Cerebras raises $5.5B, then stock pops 108%, in the first huge tech IPO of 2026

Cerebras priced at $5.5B and gained 108% on day one — the first major US tech IPO of 2026 and a strong public-market signal for AI compute.

raise $5.5Bfirst-day pop 108%
Why it matters
  • Reopens the IPO window for AI-infra companies that have been waiting since 2023.
  • Validates the wafer-scale architecture as a real competitor to NVIDIA, not a curiosity.
  • Sets a benchmark valuation for adjacent compute names eyeing a listing this year.

The developer-tool platform realigns

Microsoft began canceling internal Claude Code licenses just as OpenAI shipped Codex to phones and Anthropic released Claude for Legal. A nominal partner becomes a competitor at the IDE layer, and the lab fight is now most visible in the surface developers actually use day to day.

News The Verge AI

Microsoft starts canceling Claude Code licenses

Microsoft is rescinding Claude Code access from the thousands of internal developers it onboarded in December — a quiet reversal of a major Anthropic distribution channel.

Why it matters
  • Removes a meaningful Anthropic adoption surface inside Microsoft's developer org.
  • Points to GitHub Copilot consolidation as Microsoft's preferred lock-in path.
  • Tightens the Microsoft-OpenAI alliance at the exact moment Anthropic is winning broader business adoption.
News The Verge AI

OpenAI's Codex is now in the ChatGPT mobile app

OpenAI brings Codex to the ChatGPT mobile app, letting users monitor and approve coding tasks from a phone.

Why it matters
  • Coding agents move into a non-IDE surface — async approvals, not interactive editing.
  • Builds the workflow that justifies the long-running Codex deployments OpenAI is shipping into enterprises.

The data-center backlash gets concrete

70% of Americans oppose AI data centers near them, per Gallup. An Oregon utility cut residents to keep DC capacity, the Verge published an interactive backyard-DC map, and xAI is reportedly running close to 50 unchecked gas turbines in Mississippi. The infrastructure expansion now has a measurable political cost.

News The Verge AI

Americans do not want AI data centers in their backyards

A new Gallup survey shows over 70% of Americans oppose AI data center construction in their area — the first hard public-opinion datapoint at scale.

opposition >70%
Why it matters
  • Gives municipalities political cover to deny or delay DC permits.
  • Forces hyperscalers to negotiate community benefits, not just power and water contracts.
  • Makes orbital and remote-rural plans look less speculative and more strategic.

Long-horizon agents meet reality

Three papers and one painful field audit converge on the same point: agent benchmarks have been measuring the wrong thing. WildClawBench replaces synthetic sandboxes with realistic CLI work, STALE asks whether agents notice their memories have gone stale, LiSA targets lifelong safety, and Ontario found doctor AI notetakers fabricating clinical content in production.

News Ars Technica AI

Your doctor's AI notetaker may be making things up, Ontario audit finds

Ontario auditors found AI medical notetakers inventing therapy referrals and incorrect prescriptions in production clinical use.

Why it matters
  • First province-level audit data showing concrete clinical fabrications from deployed AI.
  • Makes the case for trajectory-aware monitoring — exactly the research direction this week's papers point at.
  • Sets up regulatory pressure on the entire ambient-clinical-AI segment.

The OpenAI trial closes

Both sides delivered closing arguments in Musk v. Altman; the jury now decides whether OpenAI's nonprofit-to-for-profit conversion stands. The trial leaves behind a useful evidentiary record about how a frontier lab is actually run — and a literal jackass trophy.

News The Verge AI

Closing time

The Verge's closing-argument analysis of Musk v. Altman — what each side bet on with their final pitch to the jury.

Papers

3 items

Long-horizon agents meet reality

Three papers and one painful field audit converge on the same point: agent benchmarks have been measuring the wrong thing. WildClawBench replaces synthetic sandboxes with realistic CLI work, STALE asks whether agents notice their memories have gone stale, LiSA targets lifelong safety, and Ontario found doctor AI notetakers fabricating clinical content in production.

Paper Hugging Face

WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation

Replaces synthetic sandboxes and final-answer checks with realistic long-horizon CLI work, exposing the gap between benchmark agents and deployable ones.

Why it matters
  • Sets a credible procurement-grade evaluation for CLI-using agents.
  • Exposes how thoroughly current benchmarks overstate agent reliability.
  • Mirrors yesterday's Pentest-in-the-Wild paper — controlled-to-realistic is now a movement.

Also today