Coauthored with Claude
Brokers are making the transition from performing duties to working operations. The Cloudflare and Stripe partnership ships an agent that opens accounts, registers domains, and deploys an utility by itself (particulars), whereas Stripe/Tempo and iWallet have every printed machine-to-machine cost protocols to make that sort of work an ordinary. Workplace paperwork, browser classes, and, in a single announcement, the telephone interface itself are subsequent on the checklist. View the expanded function of brokers as a possibility for people to perform extra.
AI Fashions
The mannequin menagerie retains increasing in measurement and form. Open weight contenders run at frontier functionality on modest {hardware}, whereas specialist fashions for voice, dialog timing, and privateness filtering take over what was options inside one basic chat mannequin. Deal with your prompts and expertise as transportable; the mannequin behind them will change.
- Anthropic has launched Opus Claude 4.8. This mannequin isn’t Mythos, which they anticipate to launch quickly. Opus 4.8 is a “modest enchancment” that claims higher outcomes on coding and larger probability of informing customers when it’s unsure about claims. Adjustments to the brokers could also be extra essential. Claude Code now has the power to plan options to massive issues involving a whole lot of subagents (“dynamic workflows”); Cowork can management the hassle put into fixing an issue.
- Cohere’s Command A+ is an open weight mixture-of-experts mannequin with 218B parameters, 25B energetic. It’s aggressive with frontier fashions and requires comparatively little {hardware} to run: Two H100s isn’t small, however it’s not an information middle both.
- Google’s bulletins at this 12 months’s I/O convention embrace Omni, a brand new mannequin that takes any sort of enter (video, audio, picture) and generates any sort of output; Gemini 3.5 Flash, a quick and environment friendly replace to their coding mannequin; Gemini Spark, a private agent; and clever eyewear, one other try at sensible glasses.
- Alibaba has introduced Qwen3.7-Max, its most succesful mannequin.
- Considering Machines has introduced a analysis preview of interplay fashions. These fashions assist pure dialog circulate. The mannequin can anticipate a speaker to complete, interrupt the speaker, reply when the speaker interrupts the mannequin, and preserve observe of time.
- OpenAI has launched new voice fashions: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper. They’re shifting from call-and-response fashions to fashions that may participate in conversations, motive, and take actions.
- OpenRouter printed price research for each Claude Opus 4.7 and GPT-5.5. GPT-5.5 raised the token worth however diminished the variety of tokens in a typical dialog. Claude saved costs the identical, however conversations are likely to require extra tokens. What’s the affect in your month-to-month invoice?
- Google has up to date its Gemma 4 fashions, claiming that they triple token technology velocity. They use a way known as multi-token prediction (MTP) to draft a sequence of tokens with a really small mannequin after which approve these tokens with the massive mannequin.
- IBM launched Granite 4.1, a group of small fashions (30B parameters and down).
- An educational paper describes “the reasoning entice,” a phenomenon wherein coaching fashions for elevated reasoning additionally will increase hallucinations about instrument use.
- Talkie is an LLM that was educated solely on information from 1931 and earlier. If you wish to know what it was wish to stay through the begin of the Despair, that is the LLM to ask.
- OpenAI has introduced a privateness filter mannequin. This can be a small specialised mannequin (1.5B) that may run on telephones and different small units. It removes personally identifiable info (PII) from textual content paperwork.
Software program Improvement
We’re starting to see anecdotal proof that the transient period of tokenmaxxing is coming to an finish. Brokers might improve productiveness, however they will additionally use tokens at an astonishing fee. So can the most recent fashions, like Anthropic’s Claude 4.8 with new options like dynamic workflows. Employers are realizing that the one option to measure productiveness is to take a look at the standard of an worker’s work slightly than counting on a man-made (and simply gameable) metric like token use. Groups that use AI successfully will probably be disciplined about token use; they’ll select decrease price (or native) fashions the place doable, reaching for costly fashions like Claude 4.8 Opus solely when mandatory.
- The Agentic AI Basis is updating the MCP protocol, with a launch candidate scheduled for July 28. Adjustments embrace making MCP a stateless protocol, including a course of for creating extensions, and aligning authorization with the OAuth and OpenID requirements.
- Google is dropping Gemini CLI and placing all of its effort behind Antigravity, its agentic software program growth platform. There are desktop and command line variations of Antigravity, however not like Gemini CLI, neither are open supply.
- What we could name Gasoline Metropolis, created by Julian Knutsen and Chris Sells? Gasoline City 2.0? Steve Yegge says it’s an SDK for constructing your individual “darkish factories” by deploying groups of collaborating brokers in any topology. It’s “a pivotal second within the Mad Max college of agent orchestration.”
- The issue with agentic programming is that brokers serve people, not teams, and programming is a staff sport. Is collaborative steering (context administration for teams) a solution?
- GitHub has launched a preview of its Copilot app, a stand-alone desktop utility for coding with AI. It’s utterly built-in with GitHub; for instance, you possibly can launch duties instantly from GitHub points.
- Should you assume tokenmaxxing is your path to promotion, take a look at burn-baby-burn. It does what it says: burns a number of tokens, quick, utilizing the LLM of your selection. We hope it’s a parody, however we wager it really works.
- Mitchell Hashimoto tweets that Anthropic’s rewrite of Bun from Zig to Rust demonstrates that programming languages at the moment are fungible. Programming language lock-in has ended; packages can simply transfer from one language to a different.
- OpenShell is a runtime atmosphere constructed with safety in thoughts from the bottom up. It’s supposed for use as a safe atmosphere for working brokers. Each agent runs in its personal sandbox; an exterior gateway manages credentials and insurance policies.
- OpenAI is shutting down its API for fine-tuning its fashions. They are saying the present fashions are higher and don’t require important fine-tuning. As Latent House factors out, this doesn’t essentially imply the tip of fine-tuning as a self-discipline, significantly for open fashions. However it could be a sign. Drew Breunig writes about what this implies for brokers and harnesses.
- Anthropic has launched Claude for Workplace 365, permitting customers to run classes that cross Phrase, Excel, and PowerPoint. Integration with Outlook is coming, although Claude for Outlook is at present a separate product.
- A plugin to Chrome permits Codex to make use of Chrome for browser duties that require you to be logged in—for instance, studying e mail.
- Firecrawl is an API that brokers can use to work together with web sites in a human method. It allows brokers to seek for the most recent information, work together with the location, and return the outcomes at scale.
- Drew Breunig’s “10 Classes for Agentic Coding” is a useful checklist of ideas, together with “Implement to be taught.” Letting an agent write all of the code is simple, however when you actually need to be taught one thing, write it by hand first.
- Deepclaude configures Claude’s autonomous agent loop to make use of DeepSeek V4 Professional slightly than certainly one of Anthropic’s fashions. It’s a great way to save lots of (DeepSeek prices a lot much less per token) and experiment with open fashions. (Honest warning: The title deepclaude might change.)
- OpenAI has introduced Codex for Work, an assistant that’s designed for workplace work slightly than software program growth.
- Kanwas is a brand new instrument for sharing context throughout brokers. It may be utilized by workgroups to collaborate on initiatives.
- Mike is an open supply AI educated for authorized work and designed to run domestically.
- GitHub is transitioning to usage-based billing for Copilot.
- OpenAI and Qualcomm are reportedly engaged on a telephone the place the consumer interface is an agent. There gained’t be any apps; the agent will do the whole lot.
Infrastructure and Operations
The infrastructure questions of the second are whether or not brokers can transact and deploy with out people, and whether or not the platforms that host open supply can keep dependable sufficient to maintain that work going. Look ahead to GitHub options to develop into aggressive. And watch AI Collectively, a cloud firm that hosts a whole lot of open supply fashions.
- TokenTuner helps management AI prices by figuring out the place firms can use lower-cost fashions productively. It makes an attempt to match token utilization to enterprise outcomes, and evaluates people and groups on how successfully they use their token funds.
- In partnership with Stripe, Cloudflare now has an agent that may create a brand new account, begin a subscription, register a website title with DNS, and deploy an utility with out human intervention apart from granting permission.
- Stripe and Tempo have launched the Machine Funds Protocol (MPP), and iWallet has laid out a roadmap for the Autonomous Settlement Protocol (ASP). These new protocols are designed to facilitate machine-to-machine transactions, transactions that need to be designed with out a human within the loop.
- The Inference Period is when inference, slightly than coaching, drives AI utilization, price, and infrastructure. GPUs stay essential, however the relative demand for CPUs will increase.
- GitHub is at risk of dropping its place on the middle of the open supply ecosystem. Issues with uptime are inflicting initiatives to seek out houses elsewhere—most lately, Ghostty.
- Collectively AI operates a cloud AI platform that’s designed particularly for inference slightly than coaching and that gives API entry to over 200 open weight fashions. As AI use will increase, the power to run fashions and supply solutions effectively turns into extra essential than the power to coach new fashions.
Safety
The patch window is shrinking to zero, and the attacker’s toolkit and the defender’s toolkit now embrace the identical AI fashions. Any vulnerability disclosed right now is being exploited tonight. The excellent news is that defenders working these instruments at scale can shut gaps sooner than ever; the unhealthy information is that the race by no means ends.
- FROST is a brand new expertise for surreptitiously discovering what web sites a consumer is visiting. It’s primarily based on measuring the I/O operations on the consumer’s SSD. FROST requires no interplay from the consumer and runs solely within the browser.
- Regrettably, neither arcane immediate injection assaults nor cryptocurrency scams are information. Nevertheless it warms a ham radio fanatic’s coronary heart to see Morse code utilized in a immediate injection to rip-off a crypto buying and selling bot.
- TeamPCP, a cybercriminal collective, has attacked GitHub by putting in a poisoned extension to VS Code. GitHub introduced that just about 4,000 repositories have been compromised, all belonging to GitHub itself; no buyer repositories have develop into victims. However anybody who installs corrupted code from GitHub’s personal repositories is susceptible.
- No Safety Meter for AI supplies a wonderful look into the state of AI safety.
- Cloudflare’s report on Challenge Glasswing and Claude Mythos is value studying. Mythos is very noteworthy for its capability to chain vulnerabilities. In actual life, few vulnerabilities are exploitable on their very own; they develop into susceptible when they’re utilized in mixture with others.
- Daniel Stenberg experiences that Mythos discovered 5 potential vulnerabilities in curl, of which one was respectable. The low rely isn’t shocking, given the standard of the curl staff’s work. What’s important is that Mythos was capable of finding a respectable vulnerability in software program that had been totally audited by people, conventional instruments, and AI.
- Who confirmed up? A safety researcher ran a honeypot with port 22 open for 54 days, and logged each try to log in: 269,000 connection makes an attempt from 7,556 distinctive IP addresses.
- GitHub’s dependency scanning service for its MCP server is now in public preview. It checks code adjustments for susceptible dependencies earlier than committing code or opening a pull request.
- Copy.fail is a lately found Linux kernel vulnerability that permits unprivileged processes to escalate privileges, and it was exploited inside a day of its launch. Not like most vulnerabilities, working contaminated packages in a container doesn’t supply safety. The time from launch of a zero-day to exploitation within the wild is certainly shrinking.
- OpenAI’s Superior Account Safety requires a bodily key or passkey for entry; there are not any passwords. {Hardware} keys are supplied by Yubico or a suitable {hardware} token.
- GPT-5.5 Cyber is a model of GPT-5.5 that has been educated as a safety instrument. As Anthropic did with Mythos, OpenAI is limiting entry to a small group of trusted customers.
- The Firefox staff has used Claude Mythos to seek out 271 beforehand unknown vulnerabilities in Firefox. Whereas this discovering is terrifying, they conclude that defenders now have the benefit. As soon as you already know the vulnerabilities, it’s doable to shut the hole between defenders and attackers.
- Claude Code can leak credentials and different secrets and techniques to public repos and package deal registries. When you choose “permit all the time” for a particular command, the command and its credentials are saved in a subdirectory of .claude. This listing can inadvertently be included right into a package deal.
Coverage and Governance
- The ArXiv preprint repository has clarified its code of conduct for AI customers. Submitters are answerable for their papers and will probably be banned for a 12 months in the event that they submit papers that use AI-generated content material inappropriately. This consists of hallucinated content material, references, and plagiarism.
- Look to China for brand spanking new approaches to information governance. China is treating information as a nationwide useful resource and constructing the infrastructure for an information economic system.
Internet
- At its I/O convention, Google introduced that conventional search will probably be changed by AI search, powered by Gemini 3.5 Flash. Each AI search and conventional search (which is de facto AI-powered) have confirmed helpful. What occurs while you eradicate one of many choices?
- Linux working in a PDF? The PDF format helps JavaScript, and C may be compiled to JavaScript.
Biology
- Colossal Biosciences has developed a 3D-printed synthetic eggshell that’s able to elevating chicks from embryos.
- Brazil has invested closely in vaccines and has created a single-shot vaccine in opposition to Dengue fever. The nation is striving for “medical sovereignty,” an idea that’s clearly associated to information sovereignty and AI sovereignty.
