Sarmadi AI Digest June 3, 2026 Updated 6:55 AM CT Today Archive Topics Saved Subscribe RSS

Microsoft Build 2026 lands; Trump signs narrower AI EO

Microsoft Build 2026 turned into a full-stack agent platform play: Scout (OpenClaw-inspired personal assistant in Teams), MAI-Thinking-1 (Microsoft's first advanced reasoning model), Project Solara (an Android-derivative OS designed for agents, not apps), an in-house dev box on RTX Spark, and Adaptive Spec-driven Scoring for evaluating agent behavior. Trump signed a narrower-than-promised AI executive order with a voluntary review framework — industry got the EO it asked for. UK CMA simultaneously ordered Google to let publishers opt out of AI Search Overviews. Anthropic scaled Claude Mythos to critical infrastructure in 15+ countries through Project Glasswing, and the news that Anthropic stock is now being accepted in lieu of cash for SF Bay Area real estate is the cleanest signal that frontier-equity is functioning as currency in tech labor markets. NVIDIA OmniDreams ships a real-time generative world model for closed-loop autonomous-vehicle simulation. The day's research adds a hard finding: linear deception probes that claimed >0.96 AUROC collapse under distribution shift.

10 papers 24 news 8 sources ← Latest

News

18 items

Microsoft Build 2026: a full-stack agent platform play

Microsoft used Build to stake out the agent platform: Scout in Teams (OpenClaw-inspired), MAI-Thinking-1 reasoning model, Project Solara as an agent-first OS, and Adaptive Spec-driven Scoring as the eval surface. Paired with the RTX Spark Surface dev box, Microsoft is now competing on every layer of the agent stack at once.

News The Verge AI

Microsoft Build 2026: The 7 biggest announcements

The Verge condenses Microsoft Build 2026 into seven plays — all of them agent-first.

Why it matters
  • Microsoft committing the entire developer event to agent infrastructure, not productivity SKUs.
  • Sets the next year's competitive surface for Google, Apple, and Anthropic.
  • Reframes Build as Microsoft's bid to control the agent OS layer.
News The Verge AI

Microsoft's first advanced reasoning AI is here

Microsoft unveiled MAI-Thinking-1, its first in-house advanced reasoning model — explicit independence from OpenAI on the reasoning frontier.

Why it matters
  • Microsoft now has both OpenAI access and a competitive in-house reasoning model — full optionality.
  • Validates the 'lab-owned and platform-owned' bet that has driven $billions of Microsoft AI capex.
  • Pressures OpenAI's pricing power inside the partnership.
News The Verge AI

Microsoft's Project Solara is an OS for AI agent gadgets

Project Solara is an Android-derived OS designed for AI agent gadgets — devices where the agent, not the app, is the primary surface.

Why it matters
  • First serious 'agent-first OS' from a hyperscaler; reframes the device-form-factor race.
  • Direct Android fork puts Microsoft into ambient/hardware territory it has avoided since Windows Phone.
  • If Solara works, the next decade's consumer-AI form factor is decided here.

Regulation moves on multiple fronts

Trump signed a narrower AI executive order with a voluntary review framework — industry got the EO it lobbied for. UK CMA ordered Google to let publishers opt out of AI Search Overviews. Amazon faces a class action over Ring facial recognition. The International Mathematical Union endorsed a warning about industry encroachment.

News TechCrunch AI

Trump signs narrower executive order on AI oversight after industry objections

Trump signed a softened AI executive order — a voluntary review framework rather than mandatory pre-release evaluation — after industry pushback.

Why it matters
  • Federal posture is now explicitly industry-shaped — strengthens the case for state-level action (Illinois, Florida).
  • 'Voluntary' framework gives labs cover while leaving the question to the courts.
  • Resolves last week's reported intra-administration fight in industry's favor.
News The Verge AI

Google must let publishers opt out of AI Search features, rules UK

UK Competition and Markets Authority orders Google to let publishers opt out of AI Search Overviews while still appearing in regular search results.

Why it matters
  • First binding ruling unpacking AI-overview opt-out from broader search opt-out — a regulatory pattern others will copy.
  • Directly addresses publisher complaints that Overviews cannibalize their traffic.
  • Material for the ongoing CNN v. Perplexity and broader news-IP fights.

Anthropic as critical-infra vendor — and currency

Anthropic expanded Claude Mythos to critical infrastructure across 15+ countries through Project Glasswing. In parallel, Wired documented SF Bay Area real estate listings accepting Anthropic stock in lieu of cash — frontier equity is functioning as a tech-labor-market reserve asset.

News TechCrunch AI

Anthropic scales Claude Mythos to critical infrastructure in 15+ countries

Anthropic expands Project Glasswing — its security vulnerability program — and Claude Mythos access to critical-infrastructure operators in 15+ countries.

Why it matters
  • Positions Anthropic alongside OpenAI's Daybreak and Rosalind as a frontier-lab security partner to governments.
  • Critical-infrastructure access creates regulatory tailwind for IPO disclosures.
  • Concrete operational counterweight to the Florida lawsuit narrative.

Agent safety: deception probes, security signals, world models

A pressure-test of LLM deception probes shows they collapse under distributional shift — current >0.96 AUROC numbers are misleading. ClawHub Security Signals catalogs how malware detectors disagree on agent skills. NVIDIA OmniDreams ships a real-time generative world model for closed-loop autonomous-vehicle simulation. The honest case for safety-as-deployment-discipline kept hardening.

Scam defense, consumer hardware, and AI cost reality

Google rolled out deepfake-call detection across the Android dialer; Uber capped employee AI spending after blowing the budget in 4 months; Opal pivoted to AI audio gadgets with OpenAI money. The consumer-facing AI economy is simultaneously growing and getting more expensive than buyers planned for.

Papers

3 items

Agent safety: deception probes, security signals, world models

A pressure-test of LLM deception probes shows they collapse under distributional shift — current >0.96 AUROC numbers are misleading. ClawHub Security Signals catalogs how malware detectors disagree on agent skills. NVIDIA OmniDreams ships a real-time generative world model for closed-loop autonomous-vehicle simulation. The honest case for safety-as-deployment-discipline kept hardening.

Paper Hugging Face

Pressure-Testing Deception Probes in LLMs: Scaling, Robustness, and the Geometry of Deceptive Representations

Linear deception probes reporting >0.96 AUROC on clean benchmarks collapse under distributional shift — the standard safety-detection metric is fragile.

Why it matters
  • Undermines current 'we have a deception detector' claims used in safety reviews.
  • Argues for distribution-shift evaluation as a default, not an optional add.
  • Pairs with The Fragility of CoT Monitoring and Models That Know How Evals Are Designed — three weeks of converging evidence that detection-based safety is fragile.

Also today