the iA takes a giant leap, ChatGPT can go get dressed!

the iA takes a giant leap ChatGPT can go get

Claude, the AI ​​which claims to be “honest, useful and harmless”, is entitled to numerous improvements with its new large language model available in three versions. Enough to push the limits of generative AI, far ahead of ChatGPT.

The race for artificial intelligence continues unabated! If OpenAI (ChatGPT), Microsoft (Copilot) and Google (Gemini) thought they had a head start, other smaller companies which also work on the development of large language models (LLM) seem to be gradually gaining ground. This is the case of the French start-up Mistral AI, which recently unveiled Le Chat, its chatbot powered by Mistral Large, which presents itself as a serious competitor to ChatGPT (see our article). The American start-up Anthropic, founded by former OpenAI collaborators, also does not intend to let this happen and has just presented, in a blog post, his new great language model, Claude 3! The “ethical” AI claims to outperform OpenAI’s GPT-4 and Google’s Gemini 1.0 on many multimodal tests. This is ambitious!

Claude 3: what changes compared to the previous version?

Claude 3 is an evolution of the Claude language model and works on the same principle as ChatGPT: you just have to submit a question via an interface for the AI ​​to respond in a natural language. Anthropic announced three language models: Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus.

They feature enhanced capabilities in analytics and forecasting, creating nuanced content, generating code, and conversing in languages ​​other than English, such as Spanish, Japanese, and French, as well as solving mathematical problems, ability to simulate reasoning… The Opus model is the most advanced in the series, “presenting near-human levels of understanding and comfort on complex tasks, approaching general intelligence”. Just that ! It also obtains better scores than GPT-4 for many key criteria on the main benchmarking tools. Anthropic also promises near-instant results for tasks like live customer chat, auto-completion, and data extraction.

Also new: AIs are becoming multimodal, allowing them to process a wide range of visual formats, including photos, tables, charts and technical diagrams. However, they cannot yet generate images.

Claude 3 Opus: the most powerful of the great language models

Claude 3 Opus is Anthropic’s most efficient model, achieving impressive results on even complex tasks. “It can respond to open-ended questions and unseen scenarios with remarkable fluidity and human-like understanding. Opus shows us the extreme limits of what is possible with generative AI”explains the company.

Thus, it outperforms its competitors on most common evaluation criteria for AI systems, including expert undergraduate knowledge (MMLU), with a score of 86.8%, compared to 86.4% for GPT-4 and 83.7% for Gemini 1.0 Ultra, and basic mathematics (GSM8K), with a score of 95% compared to 92% for GPT-4 and 94.4% for Gemini 1.0 Ultra. The gap is even wider on some programming benchmarks, like HumanEval, where Opus achieved a score of 84.9%, compared to just 67% for GPT-4 and 74.4% for Gemini 1.0 Ultra. It also slightly outperformed OpenAI’s model on several general knowledge and reasoning tests.

© Anthropic

To compare different AI models, we use a unit of measurement called tokens (tokens, in English), which makes it possible to determine the level of analysis and memorization. Claude 3 Opus has a pop-up of 200,000 tokens. That is, it is possible to give him documents containing a total of around 150,000 words and ask him questions on them. However, Anthropic says it can exceed one million tokens. This is roughly what Gemini 1.5 does, and much more than GPT-4 and its 128,000 tokens. Due to this, it can be used for task automation (planning and execution of complex actions through APIs and databases, interactive coding), for research and development (research review, brainstorming and hypothesis generation, drug discovery) and for strategy (advanced analysis of charts and graphs, financial and market trends, forecasting).

Claude 3 Sonnet and Haiku: more affordable models

Anthropic also presented its two other Claude 3 models, namely Sonnet and Haiku, which also have a pop-up window of 200,000 tokens. The first is described as “the ideal balance of intelligence and speed, especially for enterprise workloads.” Its performance is solid and its cost is lower than the other two models. In short, it has great endurance in large-scale AI deployments. Its uses are varied: data processing, sales, code generation, quality control, text analysis from images, etc.

39490465
© Anthropic

For its part, Haiku presents itself as the fastest and most compact model, with almost instantaneous responsiveness. “It responds to simple queries and requests with unparalleled speed. Users will be able to create seamless AI experiences that mimic human interactions”, explains Anthropic. In particular, it can be used for content moderation (detecting risky behavior or customer requests) and economic tasks (optimized logistics, inventory management, extracting information from unstructured data).

Claude 3: where can we test them?

The originality of Claude comes from the fact that it is an “ethical” AI. Compared to Claude 2, Claude Opus, Sonnet and Haiku are significantly less likely to refuse to respond to text commands that brush against system guardrails. Claude 3 models thus show a more nuanced understanding of requests, recognize real harm and refuse to respond to harmless prompts less often.

39490464
© Anthropic

But it goes much further. Alex Albert, prompt engineer at Anthropic, had fun trapping the Opus model as part of a “needle in the haystack” test. It involves inserting a random sentence (the needle) – here evoking pizza toppings – into a body of information which does not deal at all with the same subject (the haystack) – here they focused on programming languages ​​– and then ask a question that can only be answered using the information contained in the needle. Not only did Opus manage to find the famous “needle”, but he also recognized that it had been inserted in order to test him on his attention skills and that it had no connection with the rest of the documents provided. “I suspect this ‘fact’ about pizza toppings may have been inserted as a joke or to test whether I was paying attention, as it doesn’t fit in at all with the other topics. The documents contain no further information on pizza toppings”, he replied. Bluffing!

Claude 3 was developed primarily for professional users as it is, according to the company, particularly suitable for monitoring “complex, multi-step instructions” And “to adhere to brand voice and response guidelines, and to develop customer experiences our users can trust”. The Sonnet and Opus versions are already available from the Claude AI chatbot and Anthropic APIs in 159 countries – but not in France. Sonnet is accessible to users using Claude for free, while Opus is only available to Claude Pro subscribers. As for Haiku, it will be available soon.



ccn1