Meta, the umbrella company of WhatsApp, Instagram and Facebook, today launched Voicebox, a voice-focused artificial intelligence system. announced.
Artificial intelligence systems are changing everything, including the sound and music side. Voicebox signed by Meta was the one that made an impact in this field today. Transforming writing into music MusicGen The new system, which was introduced later, was announced directly by Meta CEO Mark Zuckerberg. This system, which the company has not yet opened to everyone, uses real human speeches. (currently in six different languages) can become. According to the statement, the system has been trained using more than 50,000 hours of voice. The system, which seems to work very well from the first examples shown, can clear the sounds loaded in it as well as making the texts sound. Here, unwanted noises such as dog barking or automobile horn can be cleared by artificial intelligence in seconds. The system, which is currently under development, may be made available to everyone in the future. how it works here for the infrastructure clearly revealed in the video “In the future, multi-purpose generative AI models like Voicebox can give natural voices to virtual assistants and NPC characters in the Metaverse.” declaring Meta, He also stated: “Voicebox can enable visually impaired people to hear text messages from friends as they are read by artificial intelligence in their own voice, give creators new tools to easily create and edit audio tracks for videos, and much more.”
YOU MAY BE INTERESTED
Of course, the issue of artificial intelligence and sound solutions is not on the agenda for the first time. For example, in recent months, the voice and speech-oriented artificial intelligence initiative ElevenLabsa platform that gives users the power to do voiceovers, create entirely new synthetic voices if desired, or clone someone’s voice released as beta. Here is this test-driven activated system of the internet world. (Especially users on 4chan) lasted only a few days. The company made a mandatory statement on Twitter. stated that they had to take precautions against these abuses.. According to reports, clips of famous names saying very bad things exploded on 4chan. Using this system, users prepared sound clips in which famous names made homophobic, transphobic, violent and racist remarks. You know, situations like this “deepfake” technology made a big splash when it first exploded. Thanks to Deepfake, the faces of female celebrities were added to many pornographic content.
Before this on artificial intelligence and sound Microsoft also sounded. The company has appeared in the past weeks. VALL-E came out with. This system focuses on automatic voice over text and It can analyze people’s voices from only 3-second recordings and make them usable in long vocalizations. Although it uses only 3 seconds of data according to initial explanations, it is a natural, not robotic. automatic voice over The system that can offerEnCodecIt is based on the codec technology ”. EnCodec, the artificial intelligence-assisted audio compression method, can seriously compress audio without loss of quality.
In the process of developing VALL-E, it also benefited from Meta’s data (Exactly 60 thousand hours of conversation). MicrosoftIn this infrastructure, it analyzes how a person’s voice sounds during speech, divides this information into separate components to make it usable, and uses the fed data to create how the speech / voice will sound out of the three-second sample. It is reported that people can imitate even the intonations and acoustics of the environment by taking the entered data. VALL-E systemis still under development and reveals great potential for the future.