Between the media and the AI ​​giants, the hour of confrontation has come

Between the media and the AI ​​giants the hour of

This time they say they want to take billions of dollars from Big Tech and no longer settle for hundreds of millions. The American media, with the value of example that they carry elsewhere in the world, have every intention of taking advantage of the current frenzy around artificial intelligence. In the viewfinder, the few dozen companies that dominate the large language models (LLM) sector, such as OpenAI, Anthropic, Hugging Face, Google AI or Meta. But everything is complex in this landscape where there is a lack of reference and where, on the side of the press publishers, the united front did not last three months.

Last spring, the CEO of Associated Press (AP) had nevertheless defended a firm position against AI operators, suggesting the adoption of a collective approach on the subject. Daisy Veerasingham had spoken of the need for a legal framework “to protect our intellectual property from these tools which are always getting smarter thanks to our work”. Pragmatism got the better of his posture. On July 13, AP announced a major agreement with OpenAI, the creator of ChatGPT, which will be able to use all the archives of this powerful 177-year-old news agency and which produces thousands of articles every day taken up by its 1,300 customers. It is therefore a major dike that the main player in artificial intelligence has managed to blow up.

A regular looting

What is it about ? Large language models need huge volumes of text to work. The most greedy is currently Google AI’s Megatron-Turing model which has ingested 1,300 billion documents, three times more than the last version of ChatGPT. True to the principle of asking for absolution rather than permission, the designers of these templates shamelessly captured everything they could: Wikipedia-like open-access texts, public domain literature, UN texts and other major global organizations, and while they were there, they sucked most of the published books, and huge stores of information like the archives of the mainstream media. Obviously, this ended up being seen because about 20% of these corpora are made up of information in the broad sense. Hence the lament of the publishers who cried looting. They are not the only ones: famous authors, like their publishing houses, are also rising up.

The precedent of social networks

The Associated Press’ choice is all the more surprising since there are at least three reasons not to rush.

One, social networks had promised mountains and wonders to press publishers who would bring them their content for free: traffic to their sites and additional advertising revenue. The result did not live up to expectations. Financial flows have been thin, audiences have certainly increased, but essentially thanks to occasional readers, therefore not very monetizable in terms of advertising and to whom it is difficult to place a subscription, the main source of income for quality media. Along the way, the specific weight of Big Tech in information has been increased and it has captured a good part of the direct relationship with information consumers.

Two, the damage is done. The AI ​​is out of the tube and will not re-enter. The major language models developed over the past year by OpenAI, Google or Meta have already been trained. Even though they are constantly evolving, their data has been diluted in algorithms, it has been transformed with hundreds of millions of dollars of investment – ​​which in itself constitutes a solid legal shield. It is therefore practically impossible to go back, except to obtain from a court the destruction of these models.

Three, valuing the contribution of the media in these artificial intelligence tools is extraordinarily complicated. If we take the catalog of a film distributor for example, its value comes from licensed productions under specific price and duration conditions. Ditto for a documentary database that derives its value from the content authorized by publishers. In both cases, the rights holder receives income linked to the use – distribution or use – of what he has transferred. Simple. None of this is applicable in AI. How to measure the contribution of New York TimesAgence France Presse or the World in ChatGPT or Bard, Google’s LLM? How to tell the difference between the articles from decades of archives that are essential for an AI to acquire some semblance of historical memory, and the production of feeds, necessary for the understanding and restitution of the news of a model, and on which its value in use will largely depend? This is the challenge of future discussions between publishers and AI players.

The hope of high compensation

Learned by the experience of social networks and research on the Internet, some publishers will seek to obtain more significant sums than in the past. No one knows exactly how much digital platforms have paid out to media in recent years. These have indeed played the division card remarkably well, encouraging over-the-counter agreements. We simply know that a company like Meta has annually paid some twenty million dollars to the New York Times, not even 2% of the newspaper’s turnover, the wall street journal being even worse off with less than 1% of its revenue coming from Meta. This time, the American publishers promise, the sums will be higher.

The looming offensive is led by Barry Diller, author of one of the greatest success stories in digital media in the United States. His conglomerate InterActiveCorp (IAC) is made up of dozens of services aimed at the general public and totaling more than 5 billion dollars in turnover and 500 million in profits. Renowned for his fighting spirit, Diller wants to take legal action to improve his negotiating position. He also intends to rally as many titles as possible to bring down the AI ​​giants. The battle has only just begun, but its repercussions will be significant for all Western media.

lep-life-health-03