Google unveils Gemini 2.0, its second generation AI model. Capable of generating images and audio, it promises much better performance and, above all, it announces the arrival of real AI agents.
Like every other company in the artificial intelligence race, Google is working to integrate artificial intelligence into virtually all of its products. It’s a real race that it’s taking on alongside OpenAI, Microsoft and many other companies. This Wednesday, December 11, 2024, the Mountain View firm has just taken a new step by presenting Gemini 2.0, its most advanced generative AI model to date. just ten months after the launch of version 1.5.
It is better than its predecessor in many areas, whether in terms of performance and latency, for understanding language, generating text or performing tasks (translation, summaries, etc.). But, above all, its architecture natively supports multimodal processing, which lays the foundations for the next big thing in AI: agents. For Google, Gemini 2 should be considered a turning point in the field of AI.
Gemini 2.0: a much more powerful multimodal AI
One of the big differences with Google’s previous AI models is that Gemini 2 is capable of “understand information through text, video, images, audio and code” and generate this type of content natively. The first version of Gemini relied on external models, such as Imagen for creating images, to respond to user queries. So it was a hub of different models rather than a model in itself.
Several sub-models will be made available over time, each meeting specific needs. For the moment, only version 2.0 Flash, the first version of this new generation, is available in experimental form. It brings significant improvements in terms of performance. In fact, it promises to be twice as fast as the Pro 1.5 model, while maintaining an equivalent level of quality. This version now natively integrates the generation of images and audio, in addition to text, making it a truly multimodal model, not only for inputs, but also for outputs – so its responses can contain both Multilingual text, image and audio text-to-speech.
Gemini 2.0: AI agents in development
These improvements make it possible to improve AI agents – also called agentic artificial intelligence – that is, components of the language model that have been specifically trained or configured to excel in a particular type of task. This is the case, for example, of Project Astra, presented last May during Google I/O 2024. Using a smartphone pointing its camera at various objects in video, the AI is quite simply able to analyze the environment, solve problems, interact in real time with the user, etc. Gemini 2.0 represents a huge improvement for Project Astra, although it is still in the prototype phase. In particular, it will allow us to converse in several languages and in mixed languages, with a better understanding of accents and rare words, but also to use Google Search, Lens and Maps in order to assist us more effectively in our daily life, to remember of certain things, and understand language with a latency almost similar to that of a 100% human conversation.
But this is not the only agent Google is working on. The company also unveiled Project Mariner, a prototype of an extension for Chrome, again built with Gemini 2.0. Its aim is to explore “the future of human-agent interaction, starting with your browser”. Project Mariner is “able to understand and reason about the information displayed on the screen by your browser, including pixels and web elements such as text, code, images and forms, and then use that information through an extension Experimental Chrome to do things for you”explains Google. But it still has many flaws to correct.
Finally, there’s Jules, an agent designed to help developers find and fix broken code, by integrating directly into a GitHub workflow. It can detect errors, analyze code logic, propose architectural optimizations and suggest improvements taking into account industry best practices. Agent projects are also underway to use AI in video games.
Google is taking a careful and methodical approach to the Gemini 2.0 rollout. You can test Flash 2.0 now in the Gemini app on mobile device or via the AI web interfacefrom the drop-down menu which groups together all the available models, by selecting Flash 2.0 Experimental.
Third-party developers can also access it through the AI Studio and Vertex AI platforms, but some advanced features like image and audio generation are currently reserved for preferred partners. Please note, this is early access, the official release is scheduled for January 2025. Finally, Google will begin to integrate Gemini 2.0 into its various products in its ecosystem, such as Gmail and Drive, at the beginning of next year .