Google I/O 2024: Ushering in the Gemini Era
During the anticipated Google I/O 2024 keynote, Sundar Pichai, CEO of Google, outlined the tech giant’s bold foray into what has been dubbed the “Gemini era”, marking a new chapter in the development and application of artificial intelligence (AI). During the two-hour keynote, the word AI was mentioned 121 times, as counted by Gemini itself, underlining a sense of acceleration in the number of AI-based features and services and the backbone to support them.
A Decade of AI Innovation
The keynote began with Pichai reminiscing about Google’s decade-long investment in AI. The company has pioneered innovations at every layer of technological development, including research, product, and infrastructure. As Google ventures deeper into the AI platform shift, Pichai emphasized the vast opportunities ahead for creators, developers, startups, and the global community. The Gemini era encapsulates this momentum, promising to harness these opportunities to their fullest potential.
Google I/O marks the beginning of the tech developers conference season, so there is no comparison to be made with other events yet. However, an interesting observation from this year’s keynote is that, despite Pichai’s point, the focus is on showcasing AI advancements primarily within Google’s own products rather than third-party applications. This approach aligns with previous years, indicating a consistent strategy.
What’s noteworthy is the clear distinction between Google’s consumer-focused services and Google Cloud’s enterprise solutions. Google remains committed to enhancing its own suite of products with AI, while Google Cloud is an enabler for businesses. This separation contrasts with Microsoft’s integrated approach, where AI tools like Copilot are seamlessly woven into Azure, delivering unified value across both consumer and enterprise segments.
Introducing Gemini: A Multimodal Frontier
Google has announced several updates across the Gemini family of AI models, introducing new capabilities and improvements:
- 1.5 Flash: A new lightweight model optimized for speed and efficiency, designed for high-volume, high-frequency tasks. It features a breakthrough long context window and is the fastest Gemini model served in the API.
- 1.5 Pro Enhancements: The 1.5 Pro model has been significantly improved, extending its context window to 2 million tokens and enhancing its abilities in code generation, logical reasoning, multi-turn conversations, and understanding audio and images.
- Gemini Nano Updates: Gemini Nano is expanding to include image inputs in addition to text. Starting with Pixel devices, it will support multimodal applications that understand text, sight, sound, and spoken language.
Next Generation Open Models:
- Gemma 2: A new generation of open models featuring a new architecture for enhanced performance and efficiency, available in various sizes.
- PaliGemma: The first vision-language model inspired by PaLI-3. Additionally, the Responsible Generative AI Toolkit has been upgraded with the LLM Comparator for evaluating model response quality.
- Project Astra: As part of Google DeepMind’s mission to build AI responsibly, Project Astra represents Google’s vision for the future of AI assistants, aimed at developing universal AI agents to assist in everyday life.
As AI usage continues to grow, encompassing more users and diverse features, the emphasis on efficiency has never been more crucial. This focus is essential to provide a seamless and compelling user experience and manage the operational costs of running these sophisticated models. It felt as if Google believed it’s equally important to educate users about the nuances of AI models. Not all models are created equal; the number of parameters and processing speed can significantly impact performance and outcomes. Understanding these differences is key to leveraging AI’s full potential while maintaining cost-effectiveness and efficiency.
The TPU Backbone
Google also announced Trillium, its sixth-generation Tensor Processing Unit (TPU), which is designed to support the training and serving of the next generation of AI foundation models with increased efficiency and performance. This new TPU achieves a 4.7X increase in peak compute performance per chip compared to its predecessor, TPU v5e, by doubling the High Bandwidth Memory (HBM) capacity and bandwidth, as well as the Interchip Interconnect (ICI) bandwidth. Trillium also introduces the third-generation SparseCore, a specialized accelerator for processing ultra-large embeddings essential for advanced ranking and recommendation workloads.
Trillium’s advancements make it possible to train foundation models faster, serve those models with reduced latency, and lower the cost of operations. Moreover, Trillium TPUs are over 67% more energy-efficient than the previous generation, aligning with Google’s commitment to sustainability. Google’s introduction of Trillium underscores the company’s ongoing investment in custom AI-specific hardware to push the boundaries of AI research and application while lowering the dependence on Nvidia’s GPUs.
Workspace, Search and Android
Across these three areas, at its core, Gemini is making users’ lives easier.
Google has rolled out significant updates to its Search platform, incorporating advanced generative AI capabilities to enhance user experience. These updates are powered by a custom Gemini model, designed to streamline the search process by providing AI Overviews for quick, comprehensive insights into various topics. This feature, which has been popular among users during its experimental phase in Search Labs, is now available to all users in the U.S., with plans for global expansion. These updates mark a significant step in Google’s ongoing efforts to reimagine and enhance the capabilities of Google Search, making information more accessible and tailored to user needs.
In Workspace, Gemini’s updates include the integration of Gemini 1.5 Pro in a refreshed side panel across Workspace apps, offering advanced question-answering and insightful responses. Additionally, Gemini capabilities are now extended to the Gmail app on mobile, facilitating tasks on the go. The announcement also highlights Gemini’s role in connecting multiple applications through AI-powered workflows.
On Android, Google is making Gemini a more powerful replacement for Google Assistant. As the updates integrate Gemini more deeply into Android’s OS and Google apps. Users can now utilize Gemini to overlay on top of apps, drag and drop AI-generated images into Gmail and Google Messages, and use a new “Ask this video” feature on YouTube for specific information retrieval. At $19.99 per month, Gemini Advanced subscribers gain access to additional features like “Ask this PDF” for document insights without full reading. Additionally, Gemini Nano will be updated to support multimodal inputs, enhancing its processing capabilities across text, visuals, and audio.
Circle to Search, initially introduced at Samsung’s Unpacked event, now offers new capabilities designed to assist kids with homework directly from supported Android phones and tablets. When faced with a challenging problem, students can use Circle to Search to access step-by-step instructions that guide them through the solution process. Google states that this feature can handle various problems involving symbolic formulas, diagrams, graphs, and more.
Embracing the Future with Responsible AI
It was good to close the keynote with a commitment from Google that when it comes to AI, the focus remains not just on technological advancement but on doing so responsibly. Google announced new AI safeguards and tools designed to make learning more engaging and accessible, emphasizing the transformative power of AI to improve lives and make the world a better place when developed responsibly. The announcement highlighted the introduction of LearnLM, a new family of models fine-tuned for learning, which integrates research-backed learning science into Google products to make learning more personalized and accessible. LearnLM powers feature across Google’s products, including Gemini, Search, YouTube, and Google Classroom, offering educational enhancements like step-by-step study guidance in the Gemini app and interactive features in YouTube educational videos. Google also emphasized its collaboration with educational institutions to test and improve these models, extending LearnLM’s capabilities beyond Google’s own products. Considering Google’s presence in K to 12, particularly in the US, the push into education seems like a natural fit and a great opportunity to become even more entrenched in education but also to train the next generation of talent on its tools. I find the collaboration with the likes of Khan Academy, which has embraced Gen AI as a transformational force of how kids learn, extremely exciting. I truly believe that AI has the opportunity to disrupt education as we know it by providing a much more personalized way of learning and equipping kids with the skills they will need in their lives.
As I reviewed the long list of I/O announcements to write this article, I couldn’t shake the feeling that while Google is undoubtedly making strides to simplify and enhance our daily lives, it is also consolidating control over the information we access. The slogan “Let Google do the Googling” epitomizes this shift, representing perhaps the most significant redefinition of the web’s landscape to date. This duality—of convenience paired with control—raises important questions about the future of information accessibility and the level of trust users will have in Google.