OpenAI has just presented ChatGPT-4o, its new AI model, equipped with impressive interaction capabilities combining text, voice and images with breathtaking responsiveness. A stunning technology that could serve as the basis for Apple’s future Siri.
During a special event broadcast online on May 13, 2024, OpenAI, creator of the famous ChatGPT, presented a major evolution of its flagship language model called ChatGPT 4o, pronounced “four oh”. The “o” means omni, because this new ChatGPT is by default multi-modal, that is to say that you can speak to it by keyboard, by voice or by presenting photos or videos …or even all of that at once. Even better, this new model will be available for free online. This date of May 13 owes nothing to chance: it was just a day before Google’s announcements during its Google I/O conference. And AI will of course be at the center of all attention and all demonstrations.
“ It is very important to us to have a product that we can make truly available and widely accessible to everyone “, declared Mira Murati, the Chief Operating Officer (CTO) of OpenAI, at the start of the conference. “ And we’re always trying to find ways to reduce friction, so everyone can use ChatGPT from anywhere “.
ChatGPT-4o: a new free ChatGPT for all
Free use will however be limited, particularly in the number of exchanges (textual, vocal or visual) that can be exchanged daily with ChatGPT, a limitation from which paying OpenAI users will not naturally suffer. Subscribers to ChatGPT Plus ($20 per month) will also be the first to be able to install the computer version of the ChatGPT application, which was previously only available on smartphones. Surprise, this app is currently only available on Mac, with a Windows version planned for later in the year. This ChatGPT for macOS features a user interface similar to Spotlight, the search feature that has been built into Macs for many years. We can also access ChatGPT using a simple keyboard shortcut (Option + Space), very similar to that of Spotlight (Command + Space). This application for Mac integrates all the new functions announced: multimodal answers, data analysis and creation of graphs, questions asked by photo or video, file analysis (to obtain a summary, a rewrite or a simple correction for example), access to the GPT Store to download GPTs (pronounced DjiPiTise) these modules dedicated to specific subjects. Via the app, ChatGPT will also keep your conversations in memory to be able to use a context specific to you in its following responses.
ChatGPT-4o: a multimodal AI
But the star of the show was of course the introduction of GPT-4o, the new large language model (LLM for Large Language Model). According to OpenAI, GPT-4o offers the same level of “intelligence” as GPT-4, while being multimodal (text, audio and vision) at all times. And all with breathtaking response speed.
“ So far, we’ve mostly focused on improving the intelligence of these models, and they’ve gotten pretty good “, explained Mira Murati. “ But this is the first time we have made such progress in terms of ease of use. This is critically important because it touches on how we envision the future of our interaction with machines. And we believe GPT-4o truly shifts the paradigm toward the future of collaboration. Interaction becomes natural and much simpler “.
GPT-4o can process any combination of text, audio and image at the same time, and generate equally multi-modal responses. The announced objective is to allow real-time communication with the machine and it must be admitted that the demonstrations carried out “live” during the conference were particularly impressive. According to OpenAI, ChatGPT-4o is able to respond on average in 320 milliseconds, with audio response time even dropping to 232 milliseconds. A response time similar to that of a natural conversation between two humans.
From a language processing perspective, GPT-4o performance can match that of GPT-4 Turbo in English and in creating or analyzing software code. And it has been significantly improved for all other languages, both in understanding and speed. To simplify, OpenAI thus indicated that ChatGPT-4o was twice as fast as ChatGPT-4 while costing half as much for those who access it via the API, that is to say access to the model in “consumption payment” mode.
ChatGPT-4o: stunning demonstrations
To really appreciate the dazzling progress made by OpenAI, particularly in terms of speed, nothing beats watching the presentation conference, the demos start from the ninth minute. In the Youtube interface, don’t forget to activate automatic subtitling with French translation, produced by AI, naturally…
In addition to the overall conference, OpenAI also released a series of videos demonstrating ChatGPT-4o’s new capabilities (click here for a full playlist). We show some breathtaking examples below.
In a fairly traditional example of real-time translation, we see for example how ChatGPT-4o makes the conversation more natural thanks to its extreme responsiveness.
Here, we see ChatGPT-4o playing the role of referee between two people playing rock, paper, scissors. It’s even ChatGPT-4o who suggests the game when the two humans ask him what they could play. An iPhone sits against a cup of coffee and watches the two players via its front camera. This way ChatGPT-4o is able to see what both players chose to tell them who won.
One of the common points of all these demonstrations is the incredible quality of ChatGPT-4o’s synthetic voice. Even if we still hear slightly robotic inflections here and there, ChatGPT-4o’s ability to modulate its intonations to make people feel a specific emotion or accentuate a particular point is truly astonishing. And the speed of interaction is also greatly improved by another new feature of ChatGPT-4o: you can interrupt it at any time, without having to wait for your vocal response to finish. We see this in particular in this demo during which one of the developers asks ChatGPT-4o to tell him a story and asks it to change its tone, from the most neutral to the most dramatic. And even singing!
GPT-4o is rolling out to ChatGPT Plus and Team users now, and it will be available to Enterprise users soon. It is also rolling out to free ChatGPT users. Just go to ChatGPT.com to try it, without even needing to create an account.
ChatGPT-4o: a version for Mac first
Beyond the incredible technological demonstration, OpenAI’s announcements show the inevitable evolution of our interactions with our machines, whatever their forms (computers, smartphone, glasses, brooch, pendant, etc.). An evolution predicted and desired at the same time by almost all works of science fiction for decades: from Star Trek to Iron Man, via Asimov’s robots, it is the voice, via natural dialogue that one addresses the machine to ask it to complete any task. In Star Trek IV Return to Earth for example, while the crew of the starship Enterprise travels back in time to 1986 (release date of the film), we see the engineer Scotty speaking vocally to a Mac from the time. And when that doesn’t work, he grabs the mouse with a knowing look to speak to her as if through a microphone! Until finally resolving to type on the keyboard.
This makes you smile of course, but the Mac example is not trivial, because the new functions of ChatGPT-4o could soon be integrated into today’s Macs. There are growing rumors that Apple and OpenAi have signed an agreement to integrate all or part of GPT into the next versions of macOS, iOS and iPadOS. And as luck would have it, the computer version of ChatGPT announced by OpenAI is first reserved for Macs! It’s almost a snub to Microsoft, which has nevertheless invested more than 10 billion dollars in OpenAI. Not to mention that all the functions of Copilot, Microsoft’s AI program, are entirely based on ChatGPT, and that the Seattle firm is working hard to integrate it into both Windows and its Office office suite. .
ChatGPT-4o; soon in Apple’s new Siri?
From there to imagining a (very) improved version of Siri there is only one step that we take all the more easily as the voice assistant really needs to improve. And we already know that Apple’s next developer conference, WWDC which will begin on June 10, 2024, will have AI as the main theme. In addition to progress in conversation, we also easily see the benefit of integrating Artificial Intelligence directly into the operating system. Imagine if you could ask your Mac: “ find me the file that Eric sent me, unless it was Jules the day we were talking about the purchase of the hailstorm, and who is talking about our next commercial offer “. You’d be hard-pressed to ask Spotlight, the current Mac search tool, that question today. Of course, for this to work, it involves giving the AI tool full or near-total access to your data. Perhaps you would be hesitant to do this with OpenAI. But perhaps it will also be easier to give such access to an AI from Apple, one of whose mantras is precisely the protection of our lives and private data. We can’t wait for June 10 to know more!