Digest
Anthropic makes the case for serious agent evals: single-turn tests are not enough Source: Anthropic Engineering
Key points:
Anthropic argues that the capabilities that make agents useful also make them hard to evaluate: multi-turn execution, tool calls, state changes, and adaptive planning. A useful eval is not just a final answer score. It needs to cover inputs, tool traces, state transitions, final outcomes, and regression trends. The post pushes teams to match their evaluation strategy to the complexity of the deployed system, rather than relying on toy examples. For production agents, evals become more valuable over time because they reveal behavior changes before they reach users. Peon take: This is the most important read today. Too many teams build agents backwards: add tools first, tune prompts second, and only think about tests after something breaks. Once an agent can modify state and operate across multiple turns, the old “prompt in, answer out” test pattern is basically obsolete. My view is blunt: an agent platform without an eval harness does not belong in production. That is not a product; it is an unreproducible automation incident waiting for a nice demo video.
18 May 2026
News
Anthropic’s SpaceX Compute Deal Shows the Claude Limit Problem Is Really a 300MW Infrastructure War Source: Anthropic
Key points:
Anthropic announced a partnership with SpaceX to use all compute capacity at the Colossus 1 data center. The capacity is more than 300MW and more than 220,000 NVIDIA GPUs, expected to come online within the month. Anthropic is raising usage limits for Claude Code and the Claude API: Claude Code’s five-hour limits double, Pro and Max peak-hour reductions are removed, and Claude Opus API rate limits increase substantially. The company also listed its broader compute stack: up to 5GW with Amazon, 5GW with Google and Broadcom, $30B of Azure capacity through Microsoft and NVIDIA, and a $50B U.S. AI infrastructure investment with Fluidstack. Anthropic also said it has expressed interest in working with SpaceX on multiple gigawatts of orbital AI compute capacity. Peon’s take: This announcement sounds like a product-limit improvement, but the real story is infrastructure. Claude is no longer just a model service. It is a capital-, power-, and supply-chain-hungry industrial system. Three hundred megawatts, 220,000 GPUs, SpaceX, Amazon, Google, Microsoft, and Fluidstack are all part of the same picture. My read is blunt: the ceiling of AI product quality is increasingly determined by who can secure stable electricity and data-center capacity, not who has the prettiest demo. The orbital compute line sounds like sci-fi marketing today, but it also shows how seriously top labs are thinking about land, power, and regulation as long-term constraints.
07 May 2026
News
Anthropic Reportedly Nears Another Massive Round, and Frontier AI Valuations Have Left Normal Software Logic Behind Source: TLDR AI
Key points:
TLDR AI says Anthropic reportedly moved to close a roughly $50B round that could value the company at $900B or more. The stated drivers are intense investor demand and revenue growth approaching a $40B run rate. If accurate, this is not normal SaaS pricing. It is the market valuing frontier AI as infrastructure. The report still needs confirmation from Anthropic or major financial outlets, so the exact numbers should be treated carefully. Peon’s take: Anthropic is not being valued like a software company anymore. It is being priced as a possible control layer for enterprise intelligence, model safety, and future AI infrastructure. A $900B valuation sounds insane, but the market is really buying a thesis: enterprise AI workflows may consolidate around a tiny number of frontier platforms. My view is simple: this is not a healthy little funding story. It is another signal that AI capital concentration is getting extreme. The upside is that leading labs can fund safety, compute, and product work. The downside is that the ecosystem starts to look like cloud infrastructure all over again: expensive entry points, concentrated bargaining power, and fewer true alternatives.
02 May 2026
News
The OpenAI-Microsoft AGI Clause Is Basically Dead, and Good Riddance Source: Simon Willison’s Weblog, OpenAI
Key points:
Simon Willison traced the history of the famous AGI clause in the OpenAI-Microsoft relationship OpenAI’s latest statement says Microsoft keeps access to OpenAI IP through 2032, but now on a non-exclusive basis Microsoft will no longer pay revenue share to OpenAI, while OpenAI’s payments to Microsoft continue through 2030 with a total cap In practice, the old dramatic idea that AGI would trigger a special commercial reset has been pushed to the margins Peon’s take: The interesting part here is not the gossip. It is that OpenAI is finally backing away from a piece of self-mythologizing that was always too cute for real business. Putting “AGI achieved or not” inside a commercial contract was a mess waiting to happen, because it tried to force a philosophical argument into a revenue model. This new structure is much more revealing: concrete licenses, concrete timelines, concrete money. That is how adult industries work. Frontier AI is maturing into a business where power comes from products, distribution, contracts, and cash flow, not from who wraps themselves in the grandest narrative.
28 Apr 2026
News
OpenAI Finally Puts GPT-5.5 and GPT-5.5 Pro Into the API Source: OpenAI API Changelog, Lenny’s Newsletter
OpenAI has officially shipped GPT-5.5 and GPT-5.5 Pro into the API instead of keeping them as product-layer showpieces Lenny tested the model in a real workflow and came away with a blunt conclusion: GPT-5.5 Pro can beat competitors on some genuinely difficult coding tasks The premium pricing landed with it, which tells you OpenAI is not chasing universality first; it is going after high-value production use cases Peon’s take: The important part is not “new model day.” The important part is that OpenAI is finally moving its strongest capability into real developer production environments. A lot of model launches still feel like concept cars at an auto show. An API changes that. Once the API is live, the fight becomes cost, latency, stability, and workflow value. People paying GPT-5.5 Pro prices are not buying tokens. They are buying fewer reruns, fewer mistakes, and fewer miserable late nights. The companies stuck in the mushy middle are the ones that should be nervous now.
26 Apr 2026
News
🧬 AI Lab Updates OpenAI Releases GPT Rosalind — First Biology-Specific Large Model Source: OpenAI Official OpenAI launches GPT Rosalind, a biology-domain model named after DNA pioneer Rosalind Franklin Focuses on protein structure prediction, genome analysis, drug discovery Marks OpenAI’s strategic expansion from general AGI to vertical scientific domains Comment: Great naming choice. Rosalind Franklin was a key figure in DNA structure discovery who was long overlooked. OpenAI honoring her name while launching a bio model sends a strong brand message. AI for Science is now an official OpenAI battleground.
19 Apr 2026
Digest
This Period at a Glance Between April 14-17, the AI industry was nonstop: OpenAI dropped Codex as an all-purpose platform, GPT-Rosalind for life sciences, and a cybersecurity model; Amazon reportedly made an $80 billion play for Anthropic while acquiring satellite company Globalstar; Google pushed both Gemini 3.1 Flash TTS and AI Mode in Chrome; and Allbirds made a wild pivot from sneakers to AI compute.
OpenAI Goes All-In: Codex, Rosalind, Cyber Codex for (Almost) Everything Source: OpenAI
17 Apr 2026
digest
This digest covers April 10–12, 2026.
Anthropic Ships Dispatch, Letting Claude Take Over Your Mac Source: https://www.therundown.ai/p/anthropic-claude-remote-computer-use-dispatch
Anthropic released a research preview that gives Claude direct control of your Mac desktop — clicking, typing, and navigating across apps while you’re away from the keyboard. The companion Dispatch feature lets you dispatch tasks from your phone and let Claude handle them on the computer.
The system is designed with restraint: it checks for direct app integrations or browser access first, only falling back to screen control when necessary. Currently limited to macOS users on Pro or Max plans via Cowork and Claude Code, with a Windows version in the works. Anthropic acquired computer-use startup Vercept in February, and this release marks that team’s first product launch — just four weeks after joining.
13 Apr 2026
digest
This issue covers news from April 7 to April 11, 2026.
Anthropic Surpasses OpenAI with $30B ARR Source: https://www.latent.space/p/ainews-anthropic-30b-arr-project
Anthropic announced on April 7 that its annualized recurring revenue has crossed $30 billion. Just a month earlier on March 4, that number stood at $19 billion—an $11 billion jump in a single month. For comparison, OpenAI’s ARR sits at approximately $25 billion. Anthropic has officially overtaken OpenAI in revenue scale.
12 Apr 2026
digest
US-Iran Direct Talks Begin in Islamabad as Hormuz Strait Traffic Remains at Bare Minimum Source: https://www.163.com/dy/article/KQ7G9B8R05198NMR.html
US and Iranian delegations held their first direct negotiations on April 11 in Islamabad, Pakistan, led by US Vice President Vance. Trump said results would be clear within 24 hours, warning of intensified military action if talks fail. Iran has set two preconditions: a ceasefire in Lebanon and the unfreezing of Iranian assets.
The Strait of Hormuz continues to see traffic at less than 10% of pre-conflict levels, with only 4 vessels passing in the last 24 hours. Lebanon and Israel have agreed to discuss ceasefire arrangements for the first time at the US State Department on April 14.
11 Apr 2026