Discussion about this post

User's avatar
David Berreby's avatar

Great post, Nathan. Curious about a couple of things.

1. What's the equivalent of a word (or part of word) in the realm of actions? IOW, what does a token represent in the world of actions? "Text in, audio out" works because there is a huge database of sound on which the model was trained, right? But how can there ever be a huge database of actions, given that actions depend on the specific embodiment and specific context of each robot? (I hope it's obvious that this isn't a rhetorical question. I'd really like to better understand how robot-makers are arriving at these models.

2. Don't you think there is going to be a lot of hesitation about letting human tele-operators peer into our diaper changes and snack-sneaking and other home situations? I bet people will be creeped out at the thought. Or do you think convenience is going to trump such concerns, as it has in the past? In the current anti-tech climate I just wonder if people are going to let home-helper robots collect this kind of data (especially when a human backup driver is involved).

Expand full comment
Rs's avatar

Can you elaborate on your view of Covariant? Are you saying they are using an older approach and not adapting to a tokenized world?

Expand full comment
2 more comments...

No posts