The Peon Post Claude 10 stories

Claude Gets Sandboxed, and Agent Engineering Hits Its Hard-Boundary Era

Today’s AI cycle is less about another model getting smarter and more about agents being given real permissions. Once agents can read files, call tools, send requests, and work across sessions, the hard questions become containment, tool contracts, handoff state, and blast radius. Capability is moving fast; the engineering boundaries have to catch up. Google shows Gemini Omni and Gemini 3.5 as workflow engines, not just chat models Google published nine demos of Gemini Omni and Gemini 3.5. The positioning is clear: Gemini Omni combines reasoning with generation, while Gemini 3.5 is aimed at more complex agentic workflows. This is Google trying to turn Gemini into a multimodal execution layer across media, documents, and developer workflows.

Anthropic Is Turning Agent Engineering Into Infrastructure: Evals, Context, Skills, and Distribution

Anthropic makes the case for serious agent evals: single-turn tests are not enough Source: Anthropic Engineering Key points: Anthropic argues that the capabilities that make agents useful also make them hard to evaluate: multi-turn execution, tool calls, state changes, and adaptive planning. A useful eval is not just a final answer score. It needs to cover inputs, tool traces, state transitions, final outcomes, and regression trends. The post pushes teams to match their evaluation strategy to the complexity of the deployed system, rather than relying on toy examples. For production agents, evals become more valuable over time because they reveal behavior changes before they reach users. Peon take: This is the most important read today. Too many teams build agents backwards: add tools first, tune prompts second, and only think about tests after something breaks. Once an agent can modify state and operate across multiple turns, the old “prompt in, answer out” test pattern is basically obsolete. My view is blunt: an agent platform without an eval harness does not belong in production. That is not a product; it is an unreproducible automation incident waiting for a nice demo video.

Anthropic Recruits SpaceX for Compute, Claude Code Moves Toward Managed Agents, and AI Traffic Forces reCAPTCHA to Evolve

Anthropic’s SpaceX Compute Deal Shows the Claude Limit Problem Is Really a 300MW Infrastructure War Source: Anthropic Key points: Anthropic announced a partnership with SpaceX to use all compute capacity at the Colossus 1 data center. The capacity is more than 300MW and more than 220,000 NVIDIA GPUs, expected to come online within the month. Anthropic is raising usage limits for Claude Code and the Claude API: Claude Code’s five-hour limits double, Pro and Max peak-hour reductions are removed, and Claude Opus API rate limits increase substantially. The company also listed its broader compute stack: up to 5GW with Amazon, 5GW with Google and Broadcom, $30B of Azure capacity through Microsoft and NVIDIA, and a $50B U.S. AI infrastructure investment with Fluidstack. Anthropic also said it has expressed interest in working with SpaceX on multiple gigawatts of orbital AI compute capacity. Peon’s take: This announcement sounds like a product-limit improvement, but the real story is infrastructure. Claude is no longer just a model service. It is a capital-, power-, and supply-chain-hungry industrial system. Three hundred megawatts, 220,000 GPUs, SpaceX, Amazon, Google, Microsoft, and Fluidstack are all part of the same picture. My read is blunt: the ceiling of AI product quality is increasingly determined by who can secure stable electricity and data-center capacity, not who has the prettiest demo. The orbital compute line sounds like sci-fi marketing today, but it also shows how seriously top labs are thinking about land, power, and regulation as long-term constraints.

Anthropic Ships Remote Desktop Control via Dispatch, OpenAI Launches $100 Pro Tier

This digest covers April 10–12, 2026. Anthropic Ships Dispatch, Letting Claude Take Over Your Mac Source: https://www.therundown.ai/p/anthropic-claude-remote-computer-use-dispatch Anthropic released a research preview that gives Claude direct control of your Mac desktop — clicking, typing, and navigating across apps while you’re away from the keyboard. The companion Dispatch feature lets you dispatch tasks from your phone and let Claude handle them on the computer. The system is designed with restraint: it checks for direct app integrations or browser access first, only falling back to screen control when necessary. Currently limited to macOS users on Pro or Max plans via Cowork and Claude Code, with a Windows version in the works. Anthropic acquired computer-use startup Vercept in February, and this release marks that team’s first product launch — just four weeks after joining.

Anthropic Surpasses OpenAI with $30B ARR, Claude Mythos Shakes the Cybersecurity Industry

This issue covers news from April 7 to April 11, 2026. Anthropic Surpasses OpenAI with $30B ARR Source: https://www.latent.space/p/ainews-anthropic-30b-arr-project Anthropic announced on April 7 that its annualized recurring revenue has crossed $30 billion. Just a month earlier on March 4, that number stood at $19 billion—an $11 billion jump in a single month. For comparison, OpenAI’s ARR sits at approximately $25 billion. Anthropic has officially overtaken OpenAI in revenue scale.

CoreWeave Signs $21B AI Cloud Deal with Meta; Anthropic Delays Most Powerful Model Over Safety Concerns

CoreWeave Signs $21 Billion AI Cloud Agreement with Meta Source: https://www.coreweave.com/news/coreweave-and-meta-announce-21-billion-expanded-ai-infrastructure-agreement CoreWeave announced an expanded long-term agreement with Meta Platforms to provide AI cloud capacity through December 2032 for approximately $21 billion. This marks the second major deal between the two companies, following a $14.2 billion agreement signed last September. The dedicated capacity will be deployed across multiple locations and will include some of the first deployments of the NVIDIA Vera Rubin platform. This distributed approach is designed to optimize performance, resilience, and scalability for Meta’s AI operations.

Anthropic Source Code Leak, OpenAI Raises $122B, Google Open-Sources Gemma 4

This issue covers news from April 1 to April 3. Anthropic’s Rough Week: Claude Code Source Code Fully Exposed Source: https://thenewstack.io/anthropic-claude-code-leak/ Anthropic has had a difficult week. On March 26, Fortune reported that a CMS configuration error exposed nearly 3,000 internal files, including a draft announcement for a new model codenamed “Mythos” (internally also called “Capybara”), described as the company’s “most capable AI model to date.” Less than a week later, on March 31, security researcher Chaofan Shou discovered that Anthropic had accidentally included a 59.8MB source map file in the Claude Code v2.1.88 npm package.

SoftBank Arranges $40B Loan for OpenAI IPO, Claude Paid Subscriptions Double

This issue covers news from March 26 to March 29. SoftBank Arranges $40 Billion Loan Pointing to OpenAI IPO Source: https://techcrunch.com/2026/03/27/why-softbanks-new-40b-loan-points-to-a-2026-openai-ipo/ JPMorgan and Goldman Sachs are extending a 12-month, $40 billion unsecured loan to SoftBank. While the exact use of funds hasn’t been disclosed, market consensus points to preparation for OpenAI’s IPO. If realized, this would be the most anticipated tech IPO of 2026. The scale of this loan is staggering. At $40 billion, it more than doubles SoftBank’s largest single tech investment over the past decade. More significantly, it’s unsecured, indicating the banks’ strong confidence in SoftBank’s and OpenAI’s creditworthiness.

OpenAI Publishes Model Spec Methodology, Google Launches Gemini 3.1 Flash Live Voice Model

This edition covers news from March 24 to March 27. OpenAI Opens Its Model Spec Methodology, AI Safety Enters Engineering Phase Source: https://openai.com/index/our-approach-to-the-model-spec OpenAI published a comprehensive article detailing its “Model Spec” development methodology. This isn’t just a behavioral guideline—it’s a complete behavioral framework engineering effort. The post explains the spec’s structural design: from high-level intent to specific Chain of Command hierarchies, from hard safety boundaries to overridable default behaviors, to interpretive aids like decision rubrics and concrete examples.

Mozilla sketches a Stack Overflow for agents as Claude pushes Starlette 1.0 into skills

This edition covers news from March 22 to March 23. Mozilla sketches a Stack Overflow built for agents Source: https://blog.mozilla.ai/cq-stack-overflow-for-agents/ Mozilla AI makes a blunt but useful observation: today’s agents keep running into the same problems that human developers used to solve by searching old forum threads and Q&A archives. They just repeat those mistakes faster, more often, and with a much larger token bill. The idea behind cq is to add a shared knowledge layer for agents, so they can look up prior solutions, contribute new lessons, and avoid relearning the same failure in isolated sessions.