Inference by Sequoia Capital
Training Data
Building the "App Store" for Robots: Hugging Face's Thomas Wolf on Physical AI
0:00
-43:08

Building the "App Store" for Robots: Hugging Face's Thomas Wolf on Physical AI

The 'iPhone moment' for robots has arrived: Why accessible hardware and open-source software are about to democratize an entire industry.
Post methodology: Claude 4.0 via custom Dust assistant @TDep-SubstackPost with the system prompt: Please read the text of the podcast transcript in the prompt and write a short post that summarizes the main points. Please make the post as concise as possible and avoid academic language or footnotes. Refer to podcast guests by their first names after the initial mention. Light editing and reformatting for the Substack editor.

Thomas Wolfe, co-founder and Chief Science Officer at Hugging Face, is painting a vivid picture of robotics today—a moment remarkably similar to the "transformers moment" for language models a few years back. Back then, software and massive text datasets converged to redefine what AI could do. With robots, the hardware has long been available (though expensive), but breakthrough software is finally emerging, making robotics accessible for everyone.

The Parallel Moment

Thomas sees robotics reaching its "iPhone moment." Just as every software developer became curious about AI and language models, a similar explosion is coming in robotics. Early breakthroughs at top academic labs and hands-on experiments—even with hobbyist-grade, 3D-printed parts—suggest that the missing link, adaptive software, is finally here.

Hugging Face's LeRobot platform is at the heart of this shift. LeRobot isn't just a tool; it's an ecosystem built on three pillars: software libraries, diverse datasets, and hardware integration. By replicating the open source, community-driven success of their transformers library, Hugging Face aims to empower every software developer, even those without traditional robotics training, to become a "roboticist."

Building a Community-Driven Ecosystem

The numbers tell a compelling story. Hugging Face's robotics community has grown to roughly 6–10,000 engaged developers, with worldwide hackathons spanning 100 locations across six continents. This goes beyond just hobbyist enthusiasm to real-world validation of demand.

More importantly, this thriving community is generating innovative ideas and contributing to open datasets, broadening access to the multi-environment training data essential for real-world robotics. Thomas describes three distinct personas joining the movement: traditional roboticists seeking better software tools, AI developers drawn to physical manifestations of their work, and even investors buying $100 robotic arms to understand the space firsthand.

Solving the Data Bottleneck

Unlike digital language models that learn from trillions of internet tokens, robotics suffers from a scarcity of diverse, real-world data. You can't just scrape the web to teach a robot how to flip pancakes or navigate different environments.

LeRobot's approach is twofold: incentivize users to record and share their own task data, and explore synthetic data generation through world models—an emerging breakthrough that can deliver photorealistic, controllable simulations as training grounds. Thomas sees world models as potentially the first major breakthrough in simulated robotics data generation in years, offering a path to scale training without requiring millions of real-world demonstrations.

Hardware Strategy: Beyond the Humanoid Hype

Rather than chasing fully autonomous, expensive humanoids, Hugging Face is betting on accessibility. Their $100 robotic arm and devices priced from $300-500 are designed for fun, education, and practical DIY solutions. This strategy deliberately lowers entry barriers for startups, allowing them to build niche, specialized solutions rather than betting everything on catching up with high-cost, high-reliability consumer robotics.

Thomas argues that humanoids face two fundamental problems: they're expensive (requiring 60+ actuators that drive costs toward car-level pricing) and they trigger uncanny valley concerns. Instead, he envisions "a galaxy of different form factors"—some more specialized, some more affordable, many more socially acceptable than human-like robots.

The Global Competitive Landscape

China has emerged as a leader in open-source AI models. Thomas recently visited Chinese AI labs and found an intensely competitive internal market where companies compete partly on being the most open. When one company (Zhipu) decided to stop open-sourcing models, they faced immediate backlash, particularly in hiring, and quickly reversed course.

For Western founders, this creates both challenge and opportunity. Chinese companies often have "nothing to lose" from open-sourcing in Western markets since they don't sell APIs there anyway. Meanwhile, Western companies are rediscovering open source as a competitive strategy—with OpenAI recently rejoining Hugging Face and other players following suit.

Business and Investment Implications

For founders, the opportunity is enormous. A community-driven approach means faster iteration and lower R&D costs. Startups can plug into a prebuilt ecosystem where expensive "reinvention of the wheel" is replaced by collaboration and open sharing.

Thomas already sees "many, many startups" building on their $100 robotic arm platform, taking the basic building blocks and creating businesses around specific manual tasks or physical world applications. With accessible hardware and emerging synthetic data capabilities, new ventures can quickly validate unique robotics use cases—whether in retail, entertainment, education, or bespoke enterprise solutions—without building every component from scratch.

The 10-Year Vision

Looking ahead, Thomas envisions a world where robotics is as ubiquitous as smartphones today. Rather than a future dominated by expensive humanoids accessible only to elites, we'll see a rich ecosystem with varied form factors serving multiple needs.

The goal isn't just better robots—it's democratizing access to physical AI. Thomas wants his kids to be able to "vibe code" robot behaviors, making robotics development as accessible as building a mobile app. AI will become a toolkit that amplifies human creativity, enabling solutions that extend beyond current human limitations.

The Open Science Philosophy

Underlying everything is Hugging Face's commitment to "open science"—not just sharing models, but teaching people how to build them. Thomas, drawing from his physics background, believes AI should eventually be as learnable as general relativity: fundamental knowledge accessible through books rather than locked behind corporate APIs.

This philosophy drives their detailed blog posts and books on training techniques, dataset creation, and distributed computing—turning users into contributors who then improve the entire ecosystem.

The Bottom Line

Hugging Face's leap into robotics isn't just about creating better robots; it's about democratizing access to physical AI. For founders, this means seizing a market at its nascent stage with enormous upside potential—where community collaboration, affordable hardware, and innovative data strategies converge to shape the future of everyday robotics.

The tools are rapidly becoming as accessible as an app store. The question isn't whether the robotics revolution will happen—it's what you'll build when it does.

Hosted by: Sonya Huang and Pat Grady


Mentioned in this episode:

  • SO-100: A 3D printed robotic arm that Hugging Face has open sources so people can download and fabricate for around ~100

  • LeRobot: Hugging Face hub for models, datasets, and tools for real-world robotics in PyTorch

  • Reachy Mini: Cute little robot made by Pollen Robotics (recently acquired by Hugging Face) that runs on LeRobot with prices starting at $299

  • TechBBQ: Startup event in Copenhagen that Thomas spoke at this year

  • Genie 3: World model from Google DeepMind

  • GPT-1: OpenAI’s first GPT was open sourced on Hugging Face

  • Llama.ccp and vLLM: Open source libraries for performing inference and serving LLM models

  • R1 1776: Open source model from Perplexity based on DeepSeek R1

  • JETP: Soviet era Journal of Experimental and Theoretical Physics that Thomas tried to access as a physics researcher

  • The Ultra-Scale Playbook: Book published by Hugging Face on training LLMs on large GPU clusters

  • FineWeb: Hugging Face guide to building large, high-quality data sets


Thanks for reading Inference by Sequoia Capital. Please pass this email on to your friends and colleagues you infer might like it.

Discussion about this episode

User's avatar