Agentic AI

Agentic AI

Google I/O 2026 Was Not Just a Model Launch. It Was Google Showing the Agent Stack.

Ken Huang's avatar
Ken Huang
May 20, 2026
∙ Paid

Google I/O 2026 had plenty of demos. There were cinematic media demos, AI-first product surfaces, glasses, search redesigns, and the usual benchmark slides. The useful read is simpler and more technical: Google is trying to turn Gemini from a chatbot family into a distributed agent runtime. The center of gravity is not one product. It is the stack formed by Gemini 3.5 Flash, Antigravity, Gemini Spark, AI Mode in Search, Gemini Omni, Chrome, and developer-facing managed agents.

Here is the paid-section promise up front. The free section will give you the map: what shipped, what matters, and what to ignore. The paid section goes deeper into the technical readout: why “Flash” no longer means low-end, why benchmark wins are less important than throughput-per-task, how Antigravity turns agents into an execution substrate, why Search-generated mini-apps could change software distribution, where Gemini Omni is genuinely hard, and what builders should actually do next.

The signal is that Google is stitching together models, tools, sandboxes, browsers, Search, Workspace, Android, Chrome, and Cloud into one agentic operating layer. That sounds abstract until you line up the announcements. Gemini 3.5 Flash is the reasoning and action model. Antigravity is the orchestration harness. Managed Agents give developers cloud-hosted execution environments. Spark brings background agents into the Gemini app and Workspace. Search uses the same model-and-harness combination to build custom interfaces, dashboards, and trackers. Gemini Omni extends the same ambition into multimodal creation. Chrome and WebMCP try to make the web legible to agents.

It is Google’s attempt to make the browser, the search box, the IDE, the phone, and the productivity suite share a common agent layer.

Figure 1. Overall picture of Google Agent Offerings

The one-page map

Google’s own language around I/O 2026 is “the agentic Gemini era.” Sundar Pichai framed the keynote around Gemini products, conversational AI, infrastructure, models, and agents, not around a single isolated model release. The developer announcement used a similar phrase: Google is moving “from prompts to action” with Gemini 3.5 Flash, Antigravity 2.0, Managed Agents in the Gemini API, and AI Studio updates.

That phrasing matters because it tells you what Google thinks the next product boundary is. The old boundary was the app: Search, Gmail, Docs, YouTube, Android Studio. The new boundary is the task. A user asks for a result. The system chooses tools, retrieves context, runs code, uses browser state, calls APIs, writes files, and asks for approval when needed. That is why the announcements are so connected.

Figure 2. The I/O 2026 agent stack.

The mistake is to cover these as separate product updates. They are separate surfaces for the same architecture.

Gemini 3.5 Flash is the center of gravity

Google introduced Gemini 3.5 Flash as the first Gemini 3.5 release and positioned it as a model built for complex agentic workflows. The Google Cloud summary calls it Google’s strongest agentic and coding model yet. The DeepMind model card says it is based on Gemini 3 Flash, adds thinking levels for quality, cost, and latency control, accepts text, images, audio, and video files, supports up to a 1 million-token context window, and outputs text with a 64,000-token output limit.

Those details explain the product strategy better than the keynote phrasing. A consumer assistant can tolerate some latency. A background agent running a long workflow cannot. A coding agent has to plan, edit files, run tests, inspect failures, revise, and present evidence. A Search agent generating a custom interface has to reason, retrieve, synthesize, and render quickly enough that the user does not abandon the session. A model used inside an orchestrated agent harness has to be evaluated less like a single answer machine and more like a component in a loop.

Google’s headline benchmarks for Gemini 3.5 Flash include Terminal-Bench 2.1 at 76.2%, GDPval-AA at 1656 Elo, MCP Atlas at 83.6%, and CharXiv Reasoning at 84.2%. The model card adds agentic and real-world benchmarks such as SWE-Bench Pro Public at 55.1%, OSWorld-Verified at 78.4%, Toolathlon at 56.5%, Finance Agent v2 at 57.9%, MRCR v2 at 77.3% for the 128k average setting, and 26.6% for the 1M pointwise setting.

Figure 3. Gemini 3.5 Flash benchmark readout.

Artificial Analysis had pre-release access and reports that Gemini 3.5 Flash high-thinking scored 55 on its Intelligence Index, nine points above Gemini 3 Flash, with gains coming mainly from agentic performance and hallucination reduction. It also reports output speed above 280 tokens per second and pricing at $1.50 per million input tokens and $9.00 per million output tokens, with a 90% cached-input discount. The cost caveat is important: Artificial Analysis says the model costs $1,552 to run its Intelligence Index, which is 5.5 times Gemini 3 Flash and 75% more than Gemini 3.1 Pro in that test suite.

That makes Gemini 3.5 Flash more interesting, not less. The model is not simply a cheap, small, “flashy” option. It is closer to a deployment-optimized frontier model: fast enough to run inside agents, strong enough to do nontrivial work, and configurable through thinking levels. For builders, that changes the question. You do not ask, “Is this the smartest model?” You ask, “Does this model complete agentic workflows faster and cheaper than a slower flagship model once retries, tool failures, and human review are included?”

Antigravity is the real platform announcement

Antigravity is the part that makes the model announcement operational. Google introduced Antigravity in 2025 as an agentic development platform that combines an AI-powered editor with a manager surface where users can spawn, orchestrate, and observe agents working asynchronously across workspaces. Agents can plan, execute, and verify tasks across an editor, terminal, and browser. The important word is verify. Without verification, autonomous coding is just fast guesswork.

At I/O 2026, Google expanded Antigravity into a broader platform. Antigravity 2.0 is a standalone desktop application intended as a central home for agent interaction, with multiple agents, dynamic subagents, scheduled tasks, and integrations across Google AI Studio, Android, and Firebase. Google also announced Antigravity CLI for terminal-first users and Antigravity SDK for developers who want programmatic access to the same harness and want to host custom agents on their own infrastructure.

This is the technical shift: Antigravity is not “AI code completion.” It is a work orchestration layer. The old IDE model assumes the developer is the active executor and the AI is a helper. Antigravity’s manager surface assumes the developer becomes a reviewer, dispatcher, and debugger of agent work. That is why artifacts matter. Google says Antigravity agents generate task lists, implementation plans, screenshots, and browser recordings so users can verify results without reading raw tool logs.

That design is correct. Logs are useful for debugging. They are bad as the primary trust interface. A human needs to know what the agent tried, what changed, what evidence supports the result, and what remains uncertain. Screenshots and recordings are not cosmetic; they are part of the verification protocol.

The same pattern appears in Managed Agents. Google says Managed Agents in the Gemini API let developers spin up an agent with a single API call that reasons, uses tools, and executes code in an isolated Linux environment, powered by Antigravity and Gemini 3.5 Flash. For enterprise and cloud customers, Google also links the agent story to Gemini Enterprise, Agent Platform, Workspace, custom connectors, and CodeMender integration.

That is the platform move. Google is selling not just a model API, but a hosted environment where agents can run, hold context, use tools, execute code, produce artifacts, and wait for approvals.

Spark and Search make agents consumer-facing

Gemini Spark is the consumer version of that same architecture. Google describes Spark as a 24/7 personal AI agent that helps users navigate digital life, powered by Gemini 3.5 and the Antigravity harness. The Gemini app announcement says Spark is cloud-based, keeps working in the background when a laptop is closed or a phone is locked, and integrates with Workspace tools such as Gmail, Docs, and Slides. Google says users choose whether to turn it on and which apps it connects to, and Spark is designed to ask first before high-stakes actions such as spending money or sending emails.

The use cases Google gives are not exotic. Parse monthly credit card statements to detect new subscription fees. Monitor school emails and produce a family digest. Synthesize meeting notes across emails and chats into a polished Google Doc and draft the project email. That is exactly the kind of work where a background agent makes sense: repetitive, context-heavy, cross-app, and valuable enough to justify occasional review.

Search is the other consumer surface. Google says AI Mode has passed 1 billion monthly users, that Search is being upgraded with Gemini 3.5 Flash as the default AI Mode model globally, and that the Search box is becoming an expanded AI interface accepting text, images, files, videos, and Chrome tabs. Google also announced information agents in Search that can run 24/7 in the background, monitor the web and fresh data sources, and roll out first to Google AI Pro and Ultra subscribers.

More importantly, Search will use Gemini 3.5 Flash and Antigravity to generate custom interfaces, visual tools, simulations, dashboards, trackers, and mini-apps. Generative UI is planned for everyone in Search this summer at no charge, while persistent custom experiences with Antigravity begin first for Google AI Pro and Ultra subscribers in the U.S.

That is the end of the free map. The real question is not whether the demos looked good. The real question is whether Google is building the first mass-market agent runtime by hiding it inside products people already use. The paid section is the technical readout: architecture, economics, Omni, Search-generated software, and the builder playbook.

Paid Section

User's avatar

Continue reading this post for free, courtesy of Ken Huang.

Or purchase a paid subscription.
© 2026 ken · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture