One of the world’s largest software companies Microsoft“VASA-1” can animate portrait photographs with its artificial intelligence system.
“VASA-1” signed by Microsoft, which has recently attached great importance to artificial intelligence, It managed to attract a lot of attention today as the strongest option in its category. See what it can do right below in your post The artificial intelligence system you can see, It takes the portrait photo uploaded into it, analyzes it and turns it into a video format. Here, the system reaches an incredibly high level of realism and surpasses previous systems by far. It is not yet publicly available because the risk of misuse is quite high at this stage. The technology, which harmonizes lip movements by analyzing the given audio file, can also simulate different emotional states and move into the third dimension, creating not only facial expressions but also head movements.
Microsoft just dropped VASA-1.
This AI can make single image sing and talk from audio reference expressively. Similar to EMO from Alibaba
10 wild examples:
1. Mona Lisa rapping Paparazzi pic.twitter.com/LSGF3mMVnD
— Min Choi (@minchoi) April 18, 2024
Before this, Google had attracted attention with its VLOGGER system. Google VLOGGER, prepared by researchers and presented as a research project for now, detects people in uploaded photos and can make them active in speech. Thanks to this system, people can use only a single photo. Can create virtual versions of themselves that speak realistically and can print in video format. The system, which is still not perfect for now, can also create people’s voices based on the entered recording.
The system, which has the power to open big doors if it is developed a little more, It raises some concerns about abuse, but it is also reported that many precautions are being worked on in this regard. The system, which does not require special artificial intelligence model training for a good result, is developed on a data set called MENTOR, which reportedly includes more than 800,000 different people and 2,200 hours of video. It is reported that work will continue on the system, which can be used in many areas from games to content production and virtual reality.