One of the world’s largest software companies Microsoft, Today we launched the Phi-3 artificial intelligence model. presented.
New technology offered by Microsoft based on Azure, Hugging Face and Ollama Phi-3 artificial intelligence modelIt has an understanding capacity of 3.8 billion parameters. In the future Phi-3 Small (7 billion parameters supported) and Phi-3 Medium (14 billion parameters supported) versions will also come. Phi-3 directly replaces Phi-2, which was released in December and is stated to be as good as Meta’s Llama 2. According to Microsoft’s own statement, the new large language model (LLM), which is as capable as GPT-3.5 but smaller in size, is based on this It does not require huge server systems to operate, so it does not create a high cost. According to the company’s statement, Phi-1 focused on coding, Phi-2 brought reasoning into play, and Phi-3 brought coding and reasoning together. Microsoft Before that, it attracted a lot of attention with its “VASA-1” artificial intelligence system.
YOU MAY BE INTERESTED IN
See what it can do right below in your post The artificial intelligence system you can see, It takes the portrait photo uploaded into it, analyzes it and turns it into a video format. Here, the system reaches an incredibly high level of realism and surpasses previous systems by far. It is not yet publicly available because the risk of misuse is quite high at this stage. The technology, which harmonizes lip movements by analyzing the given audio file, can also simulate different emotional states and move into the third dimension, creating not only facial expressions but also head movements.
Microsoft just dropped VASA-1.
This AI can make single image sing and talk from audio reference expressively. Similar to EMO from Alibaba
10 wild examples:
1. Mona Lisa rapping Paparazzi pic.twitter.com/LSGF3mMVnD
— Min Choi (@minchoi) April 18, 2024
Before this, Google had attracted attention with its VLOGGER system. Google VLOGGER, prepared by researchers and presented as a research project for now, detects people in uploaded photos and can make them active in speech. Thanks to this system, people can use only a single photo. Can create virtual versions of themselves that speak realistically and can print in video format. The system, which is still not perfect for now, can also create people’s voices based on the entered recording.
The system, which has the power to open big doors if it is developed a little more, It raises some concerns about abuse, but it is also reported that many precautions are being worked on in this regard. The system, which does not require special artificial intelligence model training for a good result, is developed on a data set called MENTOR, which reportedly includes more than 800,000 different people and 2,200 hours of video. It is reported that work will continue on the system, which can be used in many areas from games to content production and virtual reality.