AI: the media are playing for their survival

AI the media are playing for their survival

The future of media is not much worse than usual. This year, at the International Journalism Festival in Perugia, Italy, the exchanges were dominated by the brutal surge of generative artificial intelligences. These can produce text per mile, generate stunningly realistic news images and will soon create videos from a simple natural language description. By the way, these AIs are the most agile and powerful vector of false information and manipulation of the masses.

Faced with this tsunami, the profession admits to being in the blue – which is understandable given the recentness of the phenomenon. Generative AIs took off in 2022 with the first consumer version of ChatGPT for text and Dall-E, Stable Diffusion or Midjourney for images. “It’s too early to decide on specific things. Governments are just starting to legislate on social media, so it might take a bit of time,” notes Charlie Beckett of the London School of Economics, who was speaking in Perugia. Also present, Phil Chetwynd, director of information for Agence France-Presse (AFP): “I must have had 50 conversations on these subjects, no one has a very clear vision.”

However, like all major international media, AFP plays big against AI. With its 2,400 employees, the agency produces an immense amount of information. Every day, she publishes more than 600 articles in French, English, Spanish and Arabic; its video production, in full growth, takes market share from its competitors and translates into 120 subjects published each day and 1,200 monthly hours of live. Finally, there is photography with 3,000 to 4,000 news images broadcast daily.

This enormous pipeline of information is an absolute dream for the major global operators of so-called generative artificial intelligences such as OpenAI – creator of ChatGPT -, Microsoft, Google, and the hundreds of start-ups that are flocking to this market. Their common point is to feed on these heaps of textual, photographic and soon video data on a vast scale. The so-called large language models (Large Language Model or LLM) like GPT-4 are stuffed with hundreds of billions of words from anything produced in multiple languages.

More reliable, therefore more usable

Obviously, not all of this data is of equal quality. Comments from blogs or the babbling of websites devoid of journalists do not have the same value as the accounts of the special envoy to Ukraine of a major media. For an artificial intelligence system, the latter are infinitely more valuable because they are rarer, more reliable, and therefore more exploitable.

Today, if we submit to the Midjourney image generator a promptt (an instruction) such as “generate for me the image of an old lady supported by a Ukrainian soldier walking in the ruins of Bakhmout”, it is certain that the photographic corpus which will have been used to learn the generative model comes from photographers from AFP, Associated Press, Getty or Reuters present on site. And in practice, a publisher without means or ethics will prefer to create a computer-generated image for free than to pay $100 for an original photo.

From then on, the strategic question also becomes economic. These media spend millions of euros to cover the news; AFP alone permanently maintains 30 to 40 people in Ukraine. How should these publishers be compensated for their contribution to these AI systems that will inevitably compete with them?

For the moment, it is the reign of the fait accompli. “We have all been approached for the use of our bases [d’articles et d’images]. Some inform us that they collect our data for testing purposes and even promise to return it to us, notes with a smile Fabrice Fries, the president of AFP. For now, anyway, AI operators are invoking the fair use [l’usage raisonnable des contenus tiers] so there is no compensation.” The problem is that the notion of fair use is much more extensive – for some abusive – in Anglo-Saxon law than in French law. Hence the response that is beginning to organize itself. AFP is working with partner Getty Images to enforce his rights; Robert Thomson, CEO of News Corp (Rupert Murdoch’s group), met with Microsoft, which is deploying an advanced version of ChatGPT in its Bing search engine.

Existential threat to the press

Like AFP, Reuters, Bloomberg, the boss of Associated Press reacts strongly. Daisy Veerasingham evokes an existential threat to the press: “We need to organize ourselves to create a legal framework allowing us to protect our intellectual property because these tools [d’intelligence artificielle] learn and become ever smarter thanks to our work.” For publishers, the dilemma therefore arises in these terms: should they seek to maximize the income from their contributions to the development of generative AI, or on the contrary simply prohibit any collection on the grounds that “feeding the beast” will inevitably result in the devaluation of their original production and the creation of their future competition?

For the first option, many publishers dream of reproducing the mechanics of neighboring press rights (DVP). For several years, European publishers, particularly French ones, have been developing sophisticated systems intended to compensate rights holders for editorial content displayed in search engines, mainly Google. The results are mixed: four years after the vote of the law, the negotiation on the methods is not finished. Worse still, while the system is based on collective management, the major media have managed to make agreements directly with Google, thus generating substantial revenues, leaving only crumbs for the most modest publishers.

For Emmanuel Parody, who negotiates DVPs, the legal arsenal already makes it possible to enforce the rights of publishers against AI operators: as soon as they suck up and store data, the remuneration of publishers is imposed.

There remains, however, a big difference between the compensation system snatched from Google and its extension to generative AI. In the case of the search engine, it was a question of remunerating the use of the extracts which accompany the links supposed to return towards the sites of the editors. Google resisted for a long time, citing the balance of the exchange. The search engine accounts for at least one-third of traffic from global news sites, and each referral generates advertising revenue for the publisher. Except that it is less and less true, argues the Syndicate of the independent press of information on line (Spill) representing the small editors. According to him, Google is becoming a destination site where users find the answers they seek without going elsewhere.

In the case of texts generated by AI, it is no longer a question of the slightest balance. A question asked to ChatGPT generates a paragraph without reference or source, without the possibility of going back to the right holder. It’s the absolute black box, except perhaps for Microsoft Bing which should contain annotations referring to the sources. Hence the dream of a package. But here again, the practical implementation is complex. The valuation of content brought to an AI is twofold: what it costs to produce and the exploitation that is made of it. It is not possible to charge the same amount to a local news site with 10,000 readers as to OpenAI and its 100 million ChatGPT users, or the millions of others who create images with Midjourney or Slab.

The media will therefore have to choose between two logics: that of “trolleys in a circle” which would protect their content from the vacuum cleaner of the AI ​​giants, or a collaboration, accompanied by a short-term gain, but which may resemble supplying the noose to be hanged.

lep-general-02