Thin Saturday — NVIDIA opens a 2.6B world model; arXiv slop ban hardens

A quiet weekend cycle with two real signals. NVIDIA's SANA-WM dropped on Hacker News with 348 points — a 2.6B open-source world model that generates a full minute of 720p video on commodity GPUs, the kind of open release that resets the cost floor for video-AI startups overnight. ArXiv's slop ban tightened with a clearer enforcement story: a yearlong submission ban for authors who let LLMs do the writing. The AI 'haves and have nots' framing landed in TechCrunch — useful read on why the industry's mood has soured even as the numbers grow. Underneath, a thin but practical research drop: few-shot guidance for verifiable-reward RL, a fresh look at debiased omni-modal evaluation, and a sphere-aware flow-matching tweak that fixes a quiet bug in latent image diffusion.

6 papers 5 news 4 sources ← Latest

News

4 items

NVIDIA opens a 2.6B video world model

NVIDIA Labs released SANA-WM, a 2.6B open-source world model that generates a minute of 720p video on commodity hardware. Open weights at this size and quality compress the moat for closed video-AI vendors and give SMB builders a credible foundation to fine-tune.

News Hacker News

SANA-WM, a 2.6B open-source world model for 1-minute 720p video

NVIDIA Labs released SANA-WM — a 2.6B-parameter open-source world model that generates one minute of 720p video at consumer-GPU scale (348 HN points).

params 2.6Bvideo length 1 minresolution 720p

Why it matters

Open weights at this size make video-AI viable to fine-tune without lab-scale compute.
Compresses the moat for closed video-generation vendors (Runway, Pika, OpenAI Sora) on the consumer tier.
Sets a new reference point for what 'small' world models can do — relevant for any team training on game/sim data.

Source →

arXiv's slop ban gets a sharper edge

TechCrunch's writeup of arXiv's new policy makes the enforcement explicit: a yearlong submission ban for authors who let LLMs do all the work. After this week's first wave of coverage, the policy is now being read as a model other preprint servers and journals will follow.

News TechCrunch AI

Research repository ArXiv will ban authors for a year if they let AI do all the work

TechCrunch confirms arXiv's enforcement: authors found submitting LLM-generated content without substantive contribution face a one-year submission ban.

Why it matters

Concrete enforcement mechanism — not just policy language — sets a credible precedent for other archives.
Cleans the signal-to-noise ratio for any team using arXiv as a sourcing channel.

policy regulation data

Source →

The AI gold rush's haves and have-nots

TechCrunch crystallized the mood with a 'haves and have nots' read on the current cycle — the industry is making money and laying people off at the same time, and the dissonance is becoming the dominant story for non-flagship operators. Greg Brockman's product consolidation at OpenAI is one face of it; SMBs trying to keep up are the other.

News TechCrunch AI

The haves and have nots of the AI gold rush

TechCrunch on why the AI cycle's vibes have soured even inside tech — uneven distribution of wins and a widening operator gap.

market policy

Source →

News TechCrunch AI

OpenAI co-founder Greg Brockman takes charge of product strategy

TechCrunch's read on the OpenAI reorg as ChatGPT and Codex consolidate under Brockman's product leadership.

products market

Source →

Papers

3 items

Thin but practical research drop

A short list of HF papers worth a scan: few-shot guidance for verifiable-reward RL on hard problems, a debiased evaluation audit of omni-modal benchmarks, and a sphere-aware flow-matching fix that addresses a quiet correctness gap in latent image diffusion.

Paper Hugging Face

Boosting Reinforcement Learning with Verifiable Rewards via Randomly Selected Few-Shot Guidance

Demonstration-guided RLVR with random few-shot examples improves sample efficiency on hard problems where correct rollouts are otherwise unreachable.

reinforcement-learning fine-tuning

Source → Arc

Paper Hugging Face

Boosting Omni-Modal Language Models: Staged Post-Training with Visually Debiased Evaluation

Audit shows omni-modal benchmark gains are inflated by visual shortcuts; staged post-training under a debiased eval setting gives a truer measure.

multimodal evaluation benchmarks

Source → Arc

Paper Hugging Face

Aligning Latent Geometry for Spherical Flow Matching in Image Generation

Flow matching with sphere-aware radial/angular decomposition fixes a quiet correctness bug in standard linear-path latent image generation.

diffusion image-generation

Source → Arc

Also today

News · Hacker News Zerostack — A Unix-inspired coding agent written in pure Rust — Zerostack is an open-source Unix-flavored coding agent in Rust — 428 HN points and a useful counterpoint to the Claude/Codex duopoly at the dev-tool layer.
News · The Verge AI Sony tries to explain that its AI Camera Assistant doesn't suck — Sony pushed back on viral criticism of the Xperia 1 XIII AI Camera Assistant — small-camera-AI is now a flagship-phone marketing battleground.
Paper · Hugging Face Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image — Generates full street-level 3D scenes from a single satellite image, bridging the tradeoff between geometry-only and proxy-based scene generation methods.
Paper · Hugging Face Realiz3D: 3D Generation Made Photorealistic via Domain-Aware Learning — Domain-aware learning step that lifts 3D generation quality toward photorealism without re-architecting the base generator.
Paper · Hugging Face CurveBench: A Benchmark for Exact Topological Reasoning over Nested Jordan Curves — Targets exact topological reasoning over nested Jordan curves — a clean test of whether reasoning models can do geometry without arithmetic shortcuts.
Paper · Hugging Face PRISM: Prior Rectification and Uncertainty-Aware Structure Modeling for Diffusion-Based Text Image Super-Resolution — Diffusion-based text image super-resolution with prior rectification and uncertainty-aware structure modeling — practical for document AI pipelines.
Paper · Hugging Face ViMU: Benchmarking Video Metaphorical Understanding — Benchmark for video models on metaphor — pushes evaluation past literal scene description toward figurative comprehension.