Google Cloud Next 2025: Ironwood TPU, Agent Toolkits, and Google’s vertical advantage
Key Takeaways:
- Google Cloud unveiled Ironwood (TPUv7), its first inference-optimized TPU, offering massive leaps in compute (5x), HBM capacity (6x), and power efficiency (2x) over the previous generation. Performance is estimated to be within 5% of an Nvidia B200, despite TPUv7’s single primary compute die design.
- New Agent Development Kit (ADK) and Agent2Agent (A2A) protocol were introduced to simplify building and enable open collaboration between AI agents from different vendors.
- Google is fostering a rich agent ecosystem with partner support and tools like Agent Garden, integrated within environments like Google Agentspace.
- A strong focus on vertical advantage is evident, with industry-specific agent examples, partnerships, and tailored solutions built on the Vertex AI platform.
Summary of News:
- Ironwood TPU (TPUv7) announced: Google’s 7th-gen TPU, optimized for inference. Features a single primary compute die (likely ~700mm²) on TSMC N3P with CoWoS (Broadcom), consuming ~1kW (liquid-cooled). Offers 5x compute, 6x HBM capacity, 4.5x HBM bandwidth, and 2x performance/watt vs. Trillium. Performance estimated near Nvidia B200. Available later in 2025 (256 & 9,216 chip configs). Powers AI Hypercomputer. TPUv7e variant with Mediatek expected later.
- Agent Development Kit (ADK) launched: New open-source framework to simplify building multi-agent systems, integrating with MCP, Apigee, and Agent Garden (100+ connectors/samples).
- Agent2Agent (A2A) protocol introduced: First-of-its-kind open protocol from a hyperscaler enabling cross-vendor agent communication and collaboration, built on open standards. Supported by 50+ partners.
- Google Agentspace highlighted: Environment for users to interact with agents, featuring pre-built connectors for enterprise data systems (Confluence, Jira, etc.) for secure, grounded insights.
- Vertical AI Solutions emphasized: Showcased numerous industry-specific use cases (Automotive, Retail, Healthcare) and purpose-built agents, leveraging Vertex AI and partnerships. Expanded grounding capabilities with third-party data sources.
What’s Significant:
Google Cloud Next 2025 signals a major push into the “age of inference,” underpinned by the powerful and efficient Ironwood TPU. This focus on inference-optimized hardware, featuring a cost-effective single primary compute die design compared to competitors like Nvidia’s B200 chiplet approach, addresses the growing need to deploy sophisticated “thinking” AI models cost-effectively at scale.
The introduction of the ADK and, more critically, the open A2A protocol represents a significant bet on a future dominated by multi-agent systems. By fostering an open standard for agent collaboration, Google aims to unlock innovation and prevent vendor lock-in, potentially catalyzing the development of complex, automated workflows powered by specialized agents working together.
Furthermore, the clear strategy towards verticalization demonstrates Google’s understanding that AI value is maximized when tailored to specific industry needs and data contexts. By combining its powerful platform (Vertex AI, AI Hypercomputer), specialized hardware (Ironwood), agent frameworks (ADK/A2A), and deep industry partnerships, Google is building a comprehensive offering designed to embed AI deeply within the core operations of businesses across all sectors. This multi-pronged approach solidifies Google’s position as a leading force driving the next wave of AI adoption.
At Google Cloud Next 2025, Google charted an ambitious course for the future of artificial intelligence, showcasing significant advancements across its custom silicon, developer tools, and strategic market focus. The announcements paint a picture of an end-to-end AI powerhouse, pushing the boundaries from massive data center infrastructure to sophisticated, collaborative AI agents tailored for specific industries. Google’s relentless innovation in AI is clearly accelerating, aiming to solidify its leadership in the rapidly evolving landscape.
Perhaps the most striking hardware revelation was the Ironwood TPU (TPUv7), Google’s seventh-generation Tensor Processing Unit. Marking a pivotal shift, Ironwood is the first TPU specifically optimized for inference, designed to power the “thinking” models that generate insights and interpretations, heralding what Google calls the “age of inference.”
Architecturally, TPUv7 is believed to feature a single primary compute die, likely around 700mm², manufactured on TSMC’s N3P node and packaged using CoWoS technology, reportedly in partnership with Broadcom. A future variant, TPUv7e, is rumored to involve Mediatek. This single-die approach for the core compute contrasts with competitors like Nvidia’s B200, which utilizes two reticle-sized dies in a chiplet configuration. While TPUv7’s raw performance is estimated to be slightly less than the B200 (within ~5%), its simpler core construction presents an inherent cost advantage for Google, particularly as they manufacture it for their own use, avoiding external margins. This advantage facilitates both scaling up, enabling massive configurations like the announced 9,216-chip pod, and scaling out by making individual accelerators more economical to deploy.
The performance leap with Ironwood over its predecessor (Trillium) remains substantial:
- 5x more peak compute capacity, reaching 4,614 TFLOPs per chip. A full 9,216-chip pod delivers a staggering 42.5 Exaflops.
- 6x the High-Bandwidth Memory (HBM), offering 192 GB per chip.
- 4.5x the HBM bandwidth (7.2 TBps per chip).
- 1.5x the Inter-Chip Interconnect (ICI) bandwidth (1.2 Tbps bidirectional).
- An enhanced SparseCore for accelerating ultra-large embedding models.
It’s important to note that TPUs and GPUs process data differently; TPUs are often bottlenecked by memory capacity and bandwidth, whereas GPUs can be limited by raw compute power. Ironwood’s significant HBM upgrades directly address potential TPU bottlenecks.
Critically, these gains come with impressive efficiency improvements. Ironwood delivers twice the performance per watt of Trillium. However, it’s a power-hungry chip, consuming around 1kW, necessitating liquid cooling solutions across the densely packed pods to sustain performance. As a cornerstone of Google Cloud’s AI Hypercomputer architecture, Ironwood, combined with the Pathways software stack, is set to power demanding models like Gemini 2.5 and AlphaFold, aiming to deliver significantly higher “intelligence per dollar.” Ironwood will be available to Google Cloud customers later in 2025 in 256-chip and 9,216-chip configurations.
Part of this is the support for vLLM, a popular inference stack that supports nearly all Open Source LLMs, including Deepseek R1, Deepseek V3, Llama 4, and Cohere Command-R. Native vLLM supports, as well as Google’s dynamic scaling with scale to 0 support, allows self-hosting at an application and enterprise level that makes sense. These dynamic scaling software solutions will support Nvidia’s Blackwell GPUs as well, but adding support for TPUs provides another option to GCP clients.
Beyond the silicon, Google unveiled powerful new tools to foster an ecosystem of AI agents: the Agent Development Kit (ADK) and the Agent2Agent (A2A) protocol.
- The ADK is an open-source framework designed to drastically simplify building sophisticated multi-agent systems, potentially allowing agent creation in under 100 lines of code. It integrates with the Model Control Protocol (MCP) for tool usage, connects to enterprise APIs via Apigee, and provides access to Agent Garden – a library of over 100 pre-built connectors and samples.
- The A2A protocol is a groundbreaking open standard enabling agents to communicate and collaborate, irrespective of their underlying framework or vendor – a first from a hyperscaler. Built on existing standards (HTTP, SSE, JSON-RPC) and designed for security, long-running tasks, and multiple modalities (text, audio, video), A2A facilitates capability discovery (via “Agent Cards”), task management, and collaboration between agents.
The potential of A2A is underscored by the support of over 50 partners, including heavyweights like Accenture, Box, Deloitte, Salesforce, SAP, and ServiceNow, all contributing to defining the protocol and signaling a shared vision for interoperable, specialized AI agents. This ecosystem approach is vital for realizing the vision of agents seamlessly collaborating on complex tasks, like the candidate sourcing example where different agents handle discovery, scheduling, and background checks within a unified interface like Google Agentspace.
Google Agentspace itself serves as a crucial piece of this puzzle, providing the environment where users interact with these agents. Its pre-built connectors allow agents to securely access and utilize enterprise data from systems like Confluence, Jira, ServiceNow, and SharePoint, grounding AI insights in relevant business context.
Zooming out, Google Cloud Next 2025 showcased a cohesive, multi-layered AI strategy. From the raw power of Ironwood TPUs optimized for inference, through the enabling frameworks of ADK and A2A fostering agent collaboration, to the comprehensive Vertex AI platform and a clear focus on vertical-specific solutions, Google is building a formidable AI stack.
The emphasis on vertical advantage, inferred from the numerous industry-specific use cases (Automotive, Healthcare, Retail, etc.), purpose-built agents (like the MercedesBenz Automotive AI Agent or The Home Depot’s Magic Apron), and partnerships with industry experts, suggests a strategic push to deliver tailored AI value. Grounding models with trusted third-party data sources further reinforces this focus on industry relevance.
The age of pervasive, thinking AI is rapidly approaching. With its powerful new silicon, open standards for agent collaboration, a comprehensive platform, and a deepening focus on industry-specific needs, Google Cloud is positioning itself not just as a provider of AI tools, but as a key architect of this intelligent future. The synergy between hardware like Ironwood and software frameworks like ADK/A2A promises to unlock new levels of productivity and innovation across the board.