Rather Terminator, C3PO (Star Wars) or replicants (blade runner)? For nearly half a century, works of science fiction have imagined all kinds of robots, with various looks and purposes. One thing in common, however: their ability to live in the human world. Even, to interact with them, for good… and bad reasons. Sarah Connor, chased by the first named, can testify to this. The reality is obviously different. Robotics has long come up against the paradox of Moravec, the Canadian researcher who wrote in 1988: “It is relatively easy to make computers display performances worthy of an adult during intelligence tests or when playing ladies, but it is difficult, if not impossible, to give them the skills of a one-year-old child with regard to perception and mobility.” The failure of an android robot like Pepper, unplugged by SoftBank in 2021, testifies to this. The heroine imagined by James Cameron can therefore breathe. Well, maybe longer.
“Recent advances in artificial intelligence have changed the way we approach these problems. reboot, A restarting, discipline”, exposes to L’Express the Frenchman Vincent Vanhoucke, researcher and head of robotics at DeepMind (Google). His company published, in the middle of the summer, a new specialized learning model called RT-2 (Robotics Transformer 2). This benefits from the new advances made by artificial intelligence (AI), and more particularly LLMs (large language models), which underpin conversational agents like ChatGPT. The latter, as their name suggests, stand out in the field of natural language. But also visual recognition. Nourished by immense bodies of knowledge, composed of texts and images, the machines thus have a real perception of the world around them. “For example, in a kitchen, a robot will know that there are, among other things, cutlery, a tap, a sink. It will also have an idea of their specificities, their use, and so on. It will therefore instinctively understand what he is capable of doing, or not, in this environment”, summarizes Vincent Vanhoucke.
Language, vision and movement
The instructions are distilled in English, “which considerably lowers the technological barrier”. And like what we observe on ChatGPT or Bard, Google’s in-house AI, certain forms of reasoning are born in the robot. Common sense, which he transforms into movement. This is proven by the thousands of demonstrations carried out by DeepMind. In one of them, two stuffed animals are placed on a table: an octopus and a horse. By asking him to take, in an abstract way, “the land animal”, the automated arm developed by Vincent Vanhoucke’s teams chooses the horse without hesitation. Other objects then fill the table. Asked to grab an object that would make a good hammer – if you don’t have one at hand – the robot grabs a stone. He also moves a soda can near a photo of artist Taylor Swift, whom he recognized among several stars. Or hands an energy drink to someone claiming to be exhausted.
In another series of demonstrations, carried out for the New York Times, one person even asks him to “pick up Lionel Messi”, the famous football player. Nonsense, at first sight. Nevertheless, the robotic arm opts for what seems closest to it: the type of ball with which the Argentine genius is used to playing. The strength of RT-2 resides very schematically in the use of tokens (or tokens), a sequence of numbers materializing the words and sentences in the LLMs, in order to translate each action. This characteristic makes possible the interaction between the robot’s language, vision and movement.
“Previously, the dialogue between these three aspects was limited. Learning new tasks required huge amounts of data about each object or situation. This is why the majority of robots are found in environments such as factories, on automated lines very structured. Now, with a model that links each of the three parameters, interactions emerge. The robot can manipulate objects that it has never seen, but that the vision model has already observed. It can move in diverse environments, never visited, but that the language model knows despite everything. These new characteristics offer robots, for the first time, the possibility of evolving with humans”, indicates the emeritus researcher, graduate of the Ecole centrale de Paris, now CentraleSupelec, and Stanford University in California.
Isn’t that, despite everything, a bit dangerous? Vincent Vanhoucke smiles when we talk to him about the three rules of robotics imagined by the writer Isaac Asimov, supposed to protect men from the dangers of technology. “I grew up watching Goldorak as a child. Great Japanese robots that came to the rescue of humanity,” he says. And take it more seriously. “In the example of the kitchen, we can introduce constraints in a natural way: “Do not touch pointed or sharp objects”. This is a very high-level concept from a cognitive point of view. But thanks to the language model , it is also interpretable by the robot.” Put simply, RT-2 will be harmless without Wi-Fi and it has an “OFF” button.
Assist Iron Man
Some errors and approximations, the same once again as those of the LLMs, are still at work. The success rate of experiments by engineers at DeepMind varies between less than 50% (reasoning) and more than 75% (symbol recognition). “I have a lot of hope for what we are developing, but objectively, it’s still primitive”, admits Vincent Vanhoucke, who refutes any idea of marketing a robot for the moment. Back to our Moravec paradox. “The interaction with the physical world is extremely difficult. It is a real challenge in terms of mobility, dexterity”, abounds the DeepMind expert. The humanoid, moreover, is not necessarily the best solution to explore. Even if it is the one chosen, in particular, by Elon Musk for his Optimus robot, the sale of which could begin around 2027. And by the firm Boston Dynamics, the other big name in robotics, with the so-called Atlas, a solid fellow of scrap of 1m88. This former subsidiary of Google was also known by his robot dogs sturdy and capable of moving with astonishing agility, opening doors or lifting objects.
“There are really robots, in the more recent imagination, which are just as interesting”, continues Vincent Vanhoucke. “In the first part of the saga Iron ManTony Stark has machines in his workshop that help him with his physical tasks, his do-it-yourself, to often humorous scenes. And these are robots that are not fantastic from a mechanical point of view: most are simple arms. The difference is in intelligence. These robots understand what the hero demands and react to his actions. Integrating this into basic robots already available on the market could completely change the way we use them in everyday life.”
Industry, sport, household chores, cooking, helping people in difficulty… The range of possibilities is vast. Even more if we add voice commands and not just text. Nothing insurmountable. The contribution of LLMs opens the emergence of assistants to do everything. Really All. Ironically, ChatGPT, Bing, Bard, Claude, and many others were predicted to never be able to leave their screens and be useless in manual tasks. Technological developments are already proving the opposite.