Inference by Sequoia Capital
Training Data
Google I/O Afterparty: The Future of Human-AI Collaboration, From Veo to Mariner
0:00
-53:51

Google I/O Afterparty: The Future of Human-AI Collaboration, From Veo to Mariner

Google Labs is cooking products that will shape AI creativity, computer use and how we develop knowledge.
Post methodology: Claude 4.0 via custom Dust assistant @TDep-SubstackPost with the system prompt: Please read the text of the podcast transcript in the prompt and write a short post that summarizes the main points and incorporates any recent news articles, substack posts or X posts that provide helpful context for the interview. Please make the post as concise as possible and avoid academic language or footnotes. please put any linked articles or tweets inline in the text. Please refer to Podcast guests by their first names after the initial mention. Light editing and reformatting for the Substack editor.

Google's recent I/O event marked a turning point in public perception of the company's AI capabilities. As one Google Labs leader put it, "it feels like the end of chapter one and the start of chapter two." In this Training Data episode, three Google Labs leaders—Thomas Iljic, Jaclyn Konzelmann, and Simon Tokumine—shared insights into the experimental products reshaping how we create, work, and learn.

Video Generation Gets Real

Thomas, who leads Google's Whisk and Flow products, revealed how far video generation has come. The team's Veo 3 model has moved past the infamous "Will Smith eating spaghetti" test that plagued early video AI. More importantly, their approach centers on "show and tell" rather than complex text prompts—letting creators use images and references like they would with a human collaborator.

The real breakthrough isn't just better physics or fewer six-fingered hands. It's the integration of audio generation alongside video, creating what Thomas calls a "generative AI camera" for filmmaking. Tools like Whisk (for consumers) and Flow (for filmmakers) are designed around the idea that creation should be iterative—you build a world, shoot in it, then refine and reshoot as needed.

Perhaps most intriguingly, Thomas sees video generation, simulation, and gaming converging into something new: "You're kind of world building. You're saying, 'This is the stage, these are the assets, these are how things are supposed to look.' And then you shoot in it."

Bonus Essay | From Playground to Proving Ground: The Evolution of Google Labs

Your AI Assistant Actually Uses Your Computer

Jaclyn's Project Mariner represents a different kind of breakthrough—AI that can actually use your computer like a human would. Unlike approaches that parse website code, Mariner works with screenshots, making it capable of navigating any site or interface.

The evolution from the initial version is telling. Early users loved watching the AI move their mouse around, but quickly asked: "Can I please use my browser again?" The solution was elegant—Mariner now runs tasks in background virtual machines while you continue working.

The real game-changer is parallel processing. As Jaclyn explained: "A big net win was the ability for Project Mariner to do 10 tasks at once, not just one." She described coming back from an errand with multiple things on her mind, opening Mariner, entering three different tasks, and sending them off to start making progress while she returned to her document work. "It was this magic moment of just okay, not only is progress being made on these things, but I just got it off my mind."

But Google's ambitions extend far beyond browser automation. Jaclyn outlined a three-part vision: smarter agents with better models, tool use, and memory; environments that span from local devices to virtual machines; and ecosystem integration where agents interact with other agents. The ultimate goal is "a capable agent that's able to operate in a way that is omnipresent across all your devices."

The trajectory points toward true computer use—not just web browsing. As Jaclyn noted, "Right now Project Mariner, it's in the browser. People use computers. So, you know, we call this 'computer use.' So there's that entire dimension as well that I think we're gonna continue to see innovations in."

Knowledge That Adapts to You

Simon's NotebookLM has been a viral hit with its AI-generated podcast feature but has evolved into something more ambitious: a system that adapts information to your needs. The core insight is that the same knowledge might need different forms—sometimes a podcast, sometimes a mind map, sometimes even a comic book.

The mobile launch represents a key shift from simply shrinking desktop experiences to creating complementary mobile-native features. Simon envisions recording conversations on your phone that automatically become searchable, transformable knowledge in your notebook.

What's particularly compelling is the focus on "units of knowledge" tied to longer-term projects. Rather than one-off queries, NotebookLM is designed for the sustained work that actually creates value—whether you're a knowledge worker or a student.

The Bigger Picture

These three products illustrate a broader theme: AI is moving from answering questions and generating content to actively participating in creative and professional workflows. The common thread isn't just better models—it's better interfaces that match how humans actually think and work.

As all three leaders noted, Google Labs’ biggest mistakes have been in timing—being too early rather than wrong about direction. But that early exploration is now paying off as model capabilities and costs reach practical thresholds.

The future they're building looks less like traditional software and more like adaptive, context-aware collaborators. Whether you're creating a film, managing tasks, or organizing knowledge, the AI doesn't just respond to commands—it understands your goals and adapts its capabilities accordingly.

Google Labs may be an experimental playground, but these experiments are increasingly looking like the future of how we'll work and create.

Hosted by Sonya Huang, Sequoia Capital


Mentioned in this episode:

  • The Not-So-SuperVillains Episode 0.5: Potential Disaster: 2D animation pilot from Google Labs

  • Whisk: Image and video generation app for consumers

  • Flow: AI-powered filmmaking with new Veo 3 model

  • Project Mariner: research prototype exploring the future of human-agent interaction, starting with browsers

  • NotebookLM: tool for understanding and engaging with complex information including Audio Overviews and now a mobile app

  • Shop with AI Mode: Shopping app with a virtual try-on tool based on your own photos

  • Stitch: New prompt-based interface to design UI for mobile and web applications.

  • ControlNet paper: Outlined an architecture for adding conditional language to direct the outputs of image generation with diffusion models

Discussion about this episode

User's avatar