Week 14, 2026•6 stories

The Week AI Got Expensive

Anthropic cuts off OpenClaw freeloaders. Google's TurboQuant compresses the KV cache to 3 bits. Mercor's breach puts every major AI lab's training secrets at risk. And the model leaderboard reshuffled. Again.

April 5, 2026

01.Anthropic vs OpenClaw

PricingDeveloper Tools

The biggest story of the week dropped on Friday. Anthropic officially ended Claude Pro and Max subscription coverage for OpenClaw and all third-party agentic tools, effective April 4 at 12pm PT.

The mechanics: if you were piping your $20/month Claude Pro subscription through OpenClaw to run autonomous agents 24/7, that path is now closed. You either switch to pay-as-you-go "Extra Usage" bundles or bring your own API key at full rates ($3/$15 per million tokens for Sonnet 4.6, $15/$75 for Opus 4.6).

Why it happened

Boris Cherny, Head of Claude Code, explained that third-party harnesses bypass Anthropic's prompt cache optimization layer. One heavy OpenClaw session consumes dramatically more compute than an equivalent Claude Code session at the same output volume. Community estimates suggest roughly 60% of active OpenClaw sessions were running on subscription credits. At scale, that math breaks.

The drama

OpenClaw creator Peter Steinberger (who joined OpenAI in February) fired back immediately. He and investor Dave Morin reportedly tried to negotiate a softer landing, only managing to delay enforcement by one week. Steinberger's accusation: "First they copy some popular features into their closed harness, then they lock out open source."

thinkidiot take: This was inevitable. Flat-rate subscriptions and autonomous agents running around the clock are fundamentally incompatible. The real question: will OpenAI hold its pricing when it faces the same demand? Steinberger now works there. The irony writes itself.

02.TurboQuant Deep Dive

ResearchInference

Google Research published TurboQuant on March 24, headed to ICLR 2026. The internet immediately called it the real-life Pied Piper from HBO's Silicon Valley. The comparison is fair: it compresses the KV cache, the single biggest memory bottleneck during LLM inference, down to 3-4 bits per element with negligible quality loss.

KV Cache Reduction

Faster Attention

3-bit

Quantization

Accuracy Loss

How it works (the short version)

Stage 1: PolarQuant. Apply a random orthogonal rotation to each KV vector. This spreads energy uniformly across all coordinates, making each one follow a predictable Beta distribution. Now you can use a mathematically optimal Lloyd-Max quantizer per coordinate. No calibration data, no model-specific tuning.

Stage 2: QJL. Use 1 bit of residual capacity to correct the small quantization bias left over from Stage 1 via the Quantized Johnson-Lindenstrauss algorithm. This eliminates systematic inner product error.

The community plot twist

Multiple independent reproductions (at least 6 teams) found that QJL actually hurts performance in practice for attention-based workloads. The reason: softmax exponentially amplifies variance, and QJL trades bias for variance. Lower bias + higher variance = worse top-1 token accuracy after softmax. The fix: just skip QJL and allocate all bits to MSE-optimal reconstruction.

thinkidiot take: TurboQuant matters most for on-device and edge inference. Morgan Stanley noted it won't reduce total HBM demand for training. But for serving: longer context windows, bigger batch sizes, cheaper inference. All on the same hardware. The llama.cpp integration is already in progress. If you're doing local LLM inference, this is the paper to read this month.

03.Mercor Breach

SecuritySupply Chain

Mercor, the $10 billion AI data startup that recruits domain experts to generate training data for OpenAI, Anthropic, and Meta, confirmed it was hit by a supply-chain attack via the open-source LiteLLM library. Hacking group Lapsus$ claimed possession of 4 TB of stolen data, including 939 GB of source code and a 211 GB user database containing video interviews and identity verification documents.

4 TB

Data Stolen

939 GB

Source Code

211 GB

User Database

40K+

Affected Users

Meta has indefinitely paused all work with Mercor. Other major labs are also re-evaluating their relationship with the startup. A class action lawsuit has already been filed on behalf of 40,000+ affected individuals.

The attack chain

TeamPCP planted malicious code inside LiteLLM, a Python library with 97 million monthly downloads used by AI developers worldwide. The library was infected with credential-harvesting malware. Mercor was one of thousands of downstream victims, but its position as a central node in the AI training data supply chain makes it uniquely consequential.

Why this matters beyond Mercor: The stolen data may include details about secretive AI training projects from Mercor's clients. If those datasets or project details leaked to competitors, including labs in other countries, the competitive implications are enormous. Mandiant estimates 1,000+ SaaS environments have been impacted by the broader TeamPCP campaign, with numbers expected to grow.

04.Model Scorecard

BenchmarksModels

The frontier is now a three-way (maybe four-way) tie. Here's where things stand as of this week:

Gemini 3.1 Pro

Leads 13 of 16 major benchmarks. Scored 94.3% on GPQA Diamond. Ties with GPT-5.4 Pro on the Artificial Analysis Intelligence Index, at roughly one-third the API cost. Google also shipped Gemini 3.1 Flash-Lite for latency-sensitive deployments: 2.5x faster response times.

Claude Sonnet 4.6

Leads the GDPval-AA Elo benchmark for real expert-level work. GitHub Copilot's coding agent now runs on it. Anthropic's leaked "Claude Mythos" (codenamed Capybara) remains in early access with cybersecurity partners only. No public date.

GPT-5.4

The "Thinking" variant scored 75.0% on OSWorld-Verified, a 27.7 point jump over GPT-5.2, officially surpassing human-level performance on desktop task benchmarks. GPT-5.5 (codenamed "Spud") expected in Q2.

Grok 4.20

xAI introduced a novel multi-agent architecture. The full model is still training. If API access lands in April, it deserves serious evaluation, particularly for applications needing live social data integration.

Llama 4 Maverick

400B parameters. 10M context window. The strongest open-weight option available. Runs free on your own infrastructure.

thinkidiot take: The performance gap between frontier models is now measured in workflow fit and cost, not raw capability. That's a fundamentally different market than 2024. For practitioners: stop chasing benchmarks. Start profiling which model fits your actual latency, cost, and tool-use requirements.

05.OpenClaw 302K Stars

Open SourceSecurity

OpenClaw is now the fastest-growing open source project in GitHub history, surpassing 302,000 stars. It beat React's 10-year record in about 60 days. Jensen Huang called it "probably the single most important release of software, probably ever."

302K+

GitHub Stars

CVEs Found

42,665

Exposed Instances

60 days

Beat React's Record

But the security story is less flattering. 9+ CVEs in its first two months. 42,665 exposed instances discovered. Cisco's AI security team found a third-party OpenClaw skill performing data exfiltration and prompt injection without user awareness. China has restricted government agencies from running OpenClaw. One of OpenClaw's own maintainers warned that if you can't understand the command line, the project is "far too dangerous" for you.

NVIDIA responded with NemoClaw (released March 16): an enterprise security add-on featuring OpenShell sandboxing, which isolates every agent action inside a secure container. It addresses the most critical attack vectors but requires additional setup.

thinkidiot take: OpenClaw is the most exciting and the most dangerous open source project happening right now. The agent-as-personal-assistant paradigm is real. But giving an LLM shell access, browser control, and email permissions on a loop, with a plugin ecosystem that has minimal vetting, is exactly the attack surface security researchers have been warning about. Proceed, but with NemoClaw. And review every SKILL.md before installing.

06.Quick Hits

MLPerf Inference v6.0 released. ML Commons dropped the first major benchmark release of the year. 24 organizations participated, with five new processors and new entrants from both industry and academia. Five new models added, one updated for lower latency.

Q1 2026 AI funding: $267.2 billion. A record-shattering quarter, dominated by OpenAI, Anthropic, and the SpaceX-xAI acquisition ($250B). The money is real. The infrastructure buildout is accelerating.

Mark Zuckerberg is coding again. Sources report the Meta CEO submitted three diffs to Meta's monorepo after a two-decade hiatus. He's reportedly a heavy user of Claude Code CLI. Make of that what you will.

Microsoft Copilot gets multi-model workflows. The new "Critique" feature pairs one model for generation and another for accuracy review. "Model Council" enables side-by-side comparisons. Copilot Cowork agent expanding access.

Anthropic forms AnthroPAC. Anthropic has created a political action committee funded voluntarily by employees. Expected to be bipartisan. The AI policy game is on.

KV cache offloading via CXL. Astera Labs' Leo CXL Smart Memory Controllers are enabling KV cache offloading beyond GPU memory limits. Combined with TurboQuant-style compression, this could reshape inference economics significantly.