Microsoft unveils VASA-1, an artificial intelligence that allows you to animate photos and make them speak in an ultra-realistic way. The result is simply stunning! It remains to avoid excesses…
Microsoft is betting big on artificial intelligence, to the point of investing tens of billions of dollars in it. It’s quite simple, the company integrates it into all its services, whether it’s its Microsoft 365 office suite, its Edge browser, its Bing search engine, its Windows tools… Thanks to its partnership with OpenAI, it is developing incredible technologies, such as its Copilot assistant, its image generator or VALL-E, the AI that imitates human voices. This time, the Redmond firm reveals on his blog VASA-1, an artificial intelligence capable of animating photos of faces and making them speak in an ultra-realistic way. All it takes is a photo taken in portrait mode and audio to produce a video that features precise lip syncing, stunning facial animations and natural head movements. A result as incredible as it is worrying…
The First AI-Generated Video That Looks Super Real
Microsoft Research announced VASA-1.
It takes a single portrait photo and speech audio and produces a hyper-realistic talking face video with precise lip-audio sync, lifelike facial behavior, and naturalistic head movements pic.twitter.com/6bxd4mEgFR
— Bindu Reddy (@bindureddy) April 17, 2024
VASA-1: impressively realistic results
Microsoft researchers achieved this feat by combining several complex technologies associated with deep learning. VASA-1 is capable of generating high definition videos (512×512) and a frame rate of 40 frames per second. We repeat it, but the result is just breathtaking. It feels like we’re seeing real people talking, with all the nuances and subtleties of facial expressions. Lips move in rhythm with the words, eyes blink and look naturally – although the gaze is sometimes a little blank –, eyebrows raise and frown… In addition, AI can animate illustrations, take care of audio in different languages, and even singing. We can also see the Mona Lisa trying her hand at rap, and suffice to say that it’s worth the detour. A few details clearly betray the deception. The expressions can seem a bit exaggerated, while the numerous head movements can have a somewhat artificial side. In addition, the AI only manages the upper body and does not take into account non-rigid elements, such as hair or clothing. But other than that, the result is impressive!
HAS In the future, VASA-1 could be very useful for anything that requires realistic speaking avatars, for example in video games, for educational tools, in therapy, etc. But the result is so realistic that we can legitimately have concerns about the phenomenon of deepfakes that such technology can generate. The Microsoft teams are perfectly aware of this and admit that VASA-1 “could be misused to impersonate human beings”. Also, the researchers have not “do not intend to release any online demo, API, product, additional implementation details, or any related offerings as [qu’ils] born [sont] “uncertain that the technology will be used responsibly and in accordance with appropriate regulations”. Fortunately, because we still remember the fake audio of Emma Watson reciting My Kampf…