Physical Intelligence, a San Francisco-based robotics startup, published new research on Thursday demonstrating that its latest AI model can direct robots to perform tasks they were never explicitly trained to complete. The company's researchers described the capability as an unexpected and significant step towards creating a general-purpose robot brain.
The model, named π0.7, represents progress toward a system that can be given a new task in plain language, coached through it, and successfully execute it. The findings suggest robotic AI may be approaching an inflection point where capabilities begin to compound in unforeseen ways, similar to the evolution of large language models.
Core Breakthrough: Compositional Generalisation
The paper's central claim is that π0.7 exhibits "compositional generalisation" – the ability to combine skills learned in different contexts to solve novel problems. This breaks from the standard approach of training specialist models for each individual task through rote memorisation of specific data.
“Once it crosses that threshold where it goes from only doing exactly the stuff that you collect the data for to actually remixing things in new ways,” said Sergey Levine, a co-founder of Physical Intelligence and a UC Berkeley professor, “the capabilities are going up more than linearly with the amount of data.”
The Air Fryer Demonstration
The most striking example involved a robot using an air fryer it had essentially never encountered in training. The research team found only two relevant data points in its entire training dataset: one where a robot pushed an air fryer closed and another from an open-source dataset where a robot placed a bottle inside one.
“It’s very hard to track down where the knowledge is coming from, or where it will succeed or fail,” said Ashwin Balakrishna, a research scientist at the company. With no coaching, the model made a passable attempt at cooking a sweet potato. With step-by-step verbal instructions, it performed the task successfully.
Limitations and Refinement
The researchers emphasised the model's current constraints. It cannot execute complex, multi-step tasks from a single high-level command, such as "go make me some toast." However, it can succeed if guided through each sub-step verbally.
They also acknowledged that failures can stem from human error in "prompt engineering." Balakrishna cited an experiment where refining how a task was explained to the model raised its success rate from 5% to 95%.
The company measured π0.7 against its own previous specialist models and found the generalist model matched their performance in tasks like making coffee, folding laundry, and assembling boxes.
A 'Genuinely Surprising' Trajectory
The researchers highlighted that the results have been unexpectedly surprising, even to experts who know the training data intimately. Balakrishna described testing the model with a random gear set by asking, “Hey, can you rotate this gear?” with successful results.
Levine compared the moment to early encounters with GPT-2 generating unexpected content. “Where the heck did it learn about unicorns in Peru?” he said. “That’s such a weird combination. And I think that seeing that in robotics is really special.”
Commercial Path and Funding
Physical Intelligence has raised over $1 billion to date and was most recently valued at $5.6 billion. The company is reportedly in discussions for a new funding round that would nearly double its valuation to $11 billion. The team declined to comment on this.
When asked about a timeline for real-world deployment of a system based on these findings, Levine declined to speculate. “I think there’s good reason to be optimistic, and certainly it’s progressing faster than I expected a couple of years ago,” he stated. “But it’s very hard for me to answer that question.”