DeepSeek makes its V4 Pro price cut permanent; agents move onto real tools

DeepSeek made its V4 Pro price discount permanent, the day's clearest market signal — sustained price pressure from the open-weight frontier keeps compressing what proprietary labs can charge. The research feeds pushed agents toward the surfaces businesses actually use: Spreadsheet-RL trains agents on real Excel and Sheets tasks, TerminalWorld reverse-engineers evaluation from in-the-wild terminal work, and π-Bench measures proactive personal-assistant agents over long horizons. Clinical agents got more rigorous with multimodal evidence-seeking and clinical-event prediction. Efficient attention and KV serving stayed busy — full-to-sparse attention transfer in a hundred training steps, a second-generation gated DeltaNet, and service-aware cache compression for disaggregated serving.

15 papers 1 news 2 sources ← Latest

News

1 item

DeepSeek resets the price floor

DeepSeek made its V4 Pro price discount permanent. Open-weight frontier pricing keeps dropping, and each permanent cut tightens the band proprietary labs can charge for comparable capability — a direct tailwind for cost-sensitive SMB builders.

News Hacker News

DeepSeek makes the V4 Pro price discount permanent

DeepSeek converted its temporary V4 Pro price discount into permanent pricing, locking in another cut to frontier-grade API costs (534 HN points).

Why it matters

Sustained open-weight price pressure compresses what OpenAI, Anthropic, and Google can charge for equivalent tiers.
Lowers the per-token cost floor for agentic workloads where volume dominates the bill.
Strengthens the case for multi-model routing with DeepSeek on the cheap tier.

Source →

Papers

11 items

Agents move onto real business tools

Three benchmarks pull agent evaluation onto the surfaces businesses actually run on: spreadsheets, terminals, and proactive long-horizon assistance. The shared move is reverse-engineering tasks from real-world usage rather than synthetic sandboxes — the same controlled-to-realistic shift seen all week.

Paper Hugging Face

Spreadsheet-RL: Advancing Large Language Model Agents on Realistic Spreadsheet Tasks via Reinforcement Learning

Trains agents with RL on realistic Excel and Google Sheets tasks — the data-work surface most businesses actually live in.

Why it matters

Spreadsheets are the highest-leverage, least-glamorous automation target for SMBs.
RL on realistic tasks beats prompt-only approaches on the messy operations that matter.
Direct relevance to finance, ops, and analytics teams evaluating agent tooling.

agents reinforcement-learning products tool-use

Source → Arc

Paper Hugging Face

TerminalWorld: Benchmarking Agents on Real-World Terminal Tasks

A data engine that auto-reverse-engineers high-fidelity evaluation tasks from in-the-wild terminal usage.

agents evaluation benchmarks code

Source → Arc

Paper Hugging Face

π-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows

Benchmark for proactive personal-assistant agents across long-horizon, everyday workflows.

agents evaluation benchmarks products

Source → Arc

Paper Hugging Face

Maestro: Reinforcement Learning to Orchestrate Hierarchical Model-Skill Ensembles

Uses RL to orchestrate hierarchical ensembles of models and skills for autonomous agents.

agents reinforcement-learning tool-use

Source → Arc

Clinical agents get more rigorous

Healthcare AI research moved past assuming evidence is handed to the model: ClinSeekAgent automates multimodal evidence-seeking for clinical reasoning, and a separate effort trains LLMs to predict clinical events from longitudinal notes. Both respond to the deployment-risk story the field has been living this month.

Paper Hugging Face

ClinSeekAgent: Automating Multimodal Evidence Seeking for Agentic Clinical Reasoning

Agentic clinical reasoning that actively seeks multimodal evidence rather than assuming it is already in context.

Why it matters

Addresses the core failure mode behind hallucinated clinical notes — missing evidence retrieval.
Aligns with CHI-Bench's push for policy-rich, multi-step healthcare evaluation.

agents multimodal evaluation

Source → Arc

Paper Hugging Face

Training Large Language Models to Predict Clinical Events

Converts longitudinal clinical notes into supervision for predicting how patients evolve over time.

data training evaluation

Source → Arc

Attention and serving keep getting cheaper

Efficiency work continued across the stack: converting full attention to sparse in a hundred training steps, a second-generation gated DeltaNet that decouples erase and write in linear attention, and service-aware KV-cache compression for disaggregated serving.

Paper Hugging Face

Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps

Converts a full-attention model to efficient sparse attention in roughly a hundred training steps, keeping quality.

attention long-context inference

Source → Arc

Paper Hugging Face

Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

Separates erase and write operations in linear attention's recurrent state for better sequence modeling at linear cost.

attention inference

Source → Arc

Paper Hugging Face

KVServe: Service-Aware KV Cache Compression for Communication-Efficient Disaggregated LLM Serving

Service-aware KV-cache compression that cuts communication cost in disaggregated (prefill/decode-separated) serving.

inference infrastructure quantization

Source → Arc

Sharper credit assignment for RLVR

Two papers continue the week's RLVR refinement thread — discriminative token-level credit assignment and unsupervised process reward models — both aimed at giving training a signal more precise than one reward per rollout.

Paper Hugging Face

DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards

Assigns credit to individual tokens within an RLVR rollout rather than rewarding all tokens equally.

reinforcement-learning reasoning fine-tuning

Source → Arc

Paper Hugging Face

Unsupervised Process Reward Models

Builds step-level process reward models without step-level human labels.

reinforcement-learning reasoning

Source → Arc

Also today

Paper · Hugging Face WorldKV: Efficient World Memory with World Retrieval and Compression — Persistent world memory for action-conditioned video world models via retrieval and compression.
Paper · Hugging Face PhysX-Omni: Unified Simulation-Ready Physical 3D Generation for Rigid, Deformable, and Articulated Objects — Generates simulation-ready physical 3D assets spanning rigid, deformable, and articulated objects.
Paper · Hugging Face Efficient Agentic Reasoning Through Self-Regulated Simulative Planning — Lets agents decide when and how much to plan, instead of always-on reactive computation.
Paper · Hugging Face GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation — Image-generation agents that self-improve by distilling tool-orchestrated visual experience.
Paper · Hugging Face Forecasting Downstream Performance of LLMs With Proxy Metrics — Uses proxy metrics to forecast downstream LLM performance for architecture and corpus decisions.
Paper · Hugging Face AutoRubric-T2I: Robust Rule-Based Reward Model for Text-to-Image Alignment — Rule-based reward model for aligning text-to-image generation with human preferences.
Paper · Hugging Face Live Music Diffusion Models: Efficient Fine-Tuning and Post-Training of Interactive Diffusion Music Generators — Interactive streaming music generation for live performance and co-creation.
Paper · Hugging Face "I didn't Make the Micro Decisions": Measuring, Inducing, and Exposing Goal-Level AI Contributions in Collaboration — Studies how LLMs shape user goals and how to attribute goal-level contributions in human-AI collaboration.
Paper · Hugging Face Platonic Representations in the Human Brain: Unsupervised Recovery of Universal Geometry — Tests the Strong Platonic Representation Hypothesis against human brain data via unsupervised geometry recovery.