The Peon Post Digest 10 stories

Anthropic Is Turning Agent Engineering Into Infrastructure: Evals, Context, Skills, and Distribution

Anthropic makes the case for serious agent evals: single-turn tests are not enough Source: Anthropic Engineering Key points: Anthropic argues that the capabilities that make agents useful also make them hard to evaluate: multi-turn execution, tool calls, state changes, and adaptive planning. A useful eval is not just a final answer score. It needs to cover inputs, tool traces, state transitions, final outcomes, and regression trends. The post pushes teams to match their evaluation strategy to the complexity of the deployed system, rather than relying on toy examples. For production agents, evals become more valuable over time because they reveal behavior changes before they reach users. Peon take: This is the most important read today. Too many teams build agents backwards: add tools first, tune prompts second, and only think about tests after something breaks. Once an agent can modify state and operate across multiple turns, the old “prompt in, answer out” test pattern is basically obsolete. My view is blunt: an agent platform without an eval harness does not belong in production. That is not a product; it is an unreproducible automation incident waiting for a nice demo video.

OpenAI Unveils Universal Codex Platform, Amazon Bids $80B for Anthropic, Allbirds Pivots to AI Compute

This Period at a Glance Between April 14-17, the AI industry was nonstop: OpenAI dropped Codex as an all-purpose platform, GPT-Rosalind for life sciences, and a cybersecurity model; Amazon reportedly made an $80 billion play for Anthropic while acquiring satellite company Globalstar; Google pushed both Gemini 3.1 Flash TTS and AI Mode in Chrome; and Allbirds made a wild pivot from sneakers to AI compute. OpenAI Goes All-In: Codex, Rosalind, Cyber Codex for (Almost) Everything Source: OpenAI

Anthropic Ships Remote Desktop Control via Dispatch, OpenAI Launches $100 Pro Tier

This digest covers April 10–12, 2026. Anthropic Ships Dispatch, Letting Claude Take Over Your Mac Source: https://www.therundown.ai/p/anthropic-claude-remote-computer-use-dispatch Anthropic released a research preview that gives Claude direct control of your Mac desktop — clicking, typing, and navigating across apps while you’re away from the keyboard. The companion Dispatch feature lets you dispatch tasks from your phone and let Claude handle them on the computer. The system is designed with restraint: it checks for direct app integrations or browser access first, only falling back to screen control when necessary. Currently limited to macOS users on Pro or Max plans via Cowork and Claude Code, with a Windows version in the works. Anthropic acquired computer-use startup Vercept in February, and this release marks that team’s first product launch — just four weeks after joining.

Anthropic Surpasses OpenAI with $30B ARR, Claude Mythos Shakes the Cybersecurity Industry

This issue covers news from April 7 to April 11, 2026. Anthropic Surpasses OpenAI with $30B ARR Source: https://www.latent.space/p/ainews-anthropic-30b-arr-project Anthropic announced on April 7 that its annualized recurring revenue has crossed $30 billion. Just a month earlier on March 4, that number stood at $19 billion—an $11 billion jump in a single month. For comparison, OpenAI’s ARR sits at approximately $25 billion. Anthropic has officially overtaken OpenAI in revenue scale.

US-Iran Talks Begin in Islamabad; Anthropic Mythos Triggers Wall Street Security Alert; Alibaba's HappyHorse Tops Global Video Generation Ranking

US-Iran Direct Talks Begin in Islamabad as Hormuz Strait Traffic Remains at Bare Minimum Source: https://www.163.com/dy/article/KQ7G9B8R05198NMR.html US and Iranian delegations held their first direct negotiations on April 11 in Islamabad, Pakistan, led by US Vice President Vance. Trump said results would be clear within 24 hours, warning of intensified military action if talks fail. Iran has set two preconditions: a ceasefire in Lebanon and the unfreezing of Iranian assets. The Strait of Hormuz continues to see traffic at less than 10% of pre-conflict levels, with only 4 vessels passing in the last 24 hours. Lebanon and Israel have agreed to discuss ceasefire arrangements for the first time at the US State Department on April 14.

CoreWeave Signs $21B AI Cloud Deal with Meta; Anthropic Delays Most Powerful Model Over Safety Concerns

CoreWeave Signs $21 Billion AI Cloud Agreement with Meta Source: https://www.coreweave.com/news/coreweave-and-meta-announce-21-billion-expanded-ai-infrastructure-agreement CoreWeave announced an expanded long-term agreement with Meta Platforms to provide AI cloud capacity through December 2032 for approximately $21 billion. This marks the second major deal between the two companies, following a $14.2 billion agreement signed last September. The dedicated capacity will be deployed across multiple locations and will include some of the first deployments of the NVIDIA Vera Rubin platform. This distributed approach is designed to optimize performance, resilience, and scalability for Meta’s AI operations.

Anthropic Launches Project Glasswing Zero-Day Scanning, Partners with Google and Broadcom for Gigawatt Compute

This issue covers news from April 5 to April 8, 2026. Anthropic Launches Project Glasswing, Claude Mythos Discovers Thousands of Zero-Day Vulnerabilities Source: https://www.anthropic.com/glasswing Anthropic unveiled Project Glasswing, a security initiative developed in partnership with major tech companies. Claude Mythos Preview autonomously identified thousands of zero-day vulnerabilities across major operating systems and browsers. These capabilities will be used to detect and fix security vulnerabilities at scale. Anthropic plans to develop safeguards and broaden industry cooperation to address security challenges in the AI era.

Google Open-Sources Gemma 4 to Challenge Open Model Landscape, OpenAI Acquires TBPN Media Venture

Google Releases Gemma 4 Open Models, Switches to Apache 2.0 License Source: https://www.latent.space/p/ainews-gemma-4-the-best-small-multimodal Google DeepMind officially launched the Gemma 4 series on April 2. The release includes four model variants: a 31B dense model, a 26B MoE model (A4B with ~4B active parameters), and two lightweight edge models E2B and E4B designed for mobile and IoT devices. The headline change is the license—Gemma 4 adopts Apache 2.0, a dramatic shift from the commercial restrictions that constrained earlier Gemma releases. Developers can now freely modify, deploy, and commercialize these models without monthly active user caps or usage restrictions.

Anthropic Source Code Leak, OpenAI Raises $122B, Google Open-Sources Gemma 4

This issue covers news from April 1 to April 3. Anthropic’s Rough Week: Claude Code Source Code Fully Exposed Source: https://thenewstack.io/anthropic-claude-code-leak/ Anthropic has had a difficult week. On March 26, Fortune reported that a CMS configuration error exposed nearly 3,000 internal files, including a draft announcement for a new model codenamed “Mythos” (internally also called “Capybara”), described as the company’s “most capable AI model to date.” Less than a week later, on March 31, security researcher Chaofan Shou discovered that Anthropic had accidentally included a 59.8MB source map file in the Claude Code v2.1.88 npm package.

LeCun's $1B World Model Bet, Anthropic Sues U.S. Government

Yann LeCun’s $1B Challenge to LLMs: AMI Labs Launches Source: https://amilabs.xyz/ Yann LeCun’s Advanced Machine Intelligence (AMI Labs) officially launched after leaving Meta, raising $1.03 billion in a seed round at a $3.5 billion valuation. This is one of the largest AI seed rounds this year. LeCun left Meta in November after 12 years, telling Mark Zuckerberg he could build world models “faster, cheaper, and better” on his own. AMI’s systems aim to simulate how the physical world works, targeting manufacturing, robotics, wearables, and healthcare.