A realistic path to robotic foundation models

Jun 5, 2024

Some thoughts and excitement after revisiting the industry thanks to Physical Intelligence founders Sergey Levine and Chelsea Finn. Not “agents” and not “AGI.”

Read →

4 Comments

David Berreby

Jun 6, 2024

Great post, Nathan. Curious about a couple of things.

1. What's the equivalent of a word (or part of word) in the realm of actions? IOW, what does a token represent in the world of actions? "Text in, audio out" works because there is a huge database of sound on which the model was trained, right? But how can there ever be a huge database of actions, given that actions depend on the specific embodiment and specific context of each robot? (I hope it's obvious that this isn't a rhetorical question. I'd really like to better understand how robot-makers are arriving at these models.

2. Don't you think there is going to be a lot of hesitation about letting human tele-operators peer into our diaper changes and snack-sneaking and other home situations? I bet people will be creeped out at the thought. Or do you think convenience is going to trump such concerns, as it has in the past? In the current anti-tech climate I just wonder if people are going to let home-helper robots collect this kind of data (especially when a human backup driver is involved).

1. this is where the rubber really hits the road. In reality, I think one token is approximately one set point for some low level PID controller. Standardizing this across robots is EXTREMELY messy and hard. I'm amazed that a messy first approach ever showed any sign of life. I've heard what they did for RT-X was a bit hacky, but I don't have the details. I'll pry a bit more with people who work there.

2. Yes. Especially in Western affluent cultures, but I don't think it is uniform. There are certainly enough people that would want this. E.g. only during the day when no one is home.

Reply

Share

Rs

Jun 6, 2024

Can you elaborate on your view of Covariant? Are you saying they are using an older approach and not adapting to a tokenized world?

Mostly, I think they wanted to do this but were early. Now they have customers and stuff that'll steer culture. They apply similar tools, but to narrow customer tasks.

They still announced their own foundation model, so we'll see if they can pivot, they're now taxed with inertia elsewhere. https://covariant.ai/insights/introducing-rfm-1-giving-robots-human-like-reasoning-capabilities/

Reply

Share