Meta downloaded a data set containing millions of books with torrent

Meta downloaded a data set containing millions of books with


Meta has downloaded a large data set containing millions of pirate books via torrent, and these data were used in the training of artificial intelligence models.

Last month MetaHe admitted that he had downloaded a data set of tens of millions of pirate books, known as Libgen. New evidence is through Meta’s Anna’s Archive At least 81.7 Tarabayt showed that it downloaded data as torrent. The company, which was also transferred from Libgen that 80.6 terabayta data previously downloaded, said last year.illegal training of artificial intelligence models on pirate books”. In a statement supported by more than 8,500 people last year, artificial intelligence systems, which are strengthened from large language models such as commodity signed llama The use of written works without permission and without payment was criticized. “These technologies mimic our language, stories, style and ideas. Millions of copyright books, articles, essays and poems are almost a food for artificial intelligence systems, they are seen as eternal dishes without bills ” The authors who say to the publishers of the companies that develop these systems He stated that they did not license and said that they were damaged.

You may be interested in

The authors demanded that artificial intelligence and big language model developed these steps to take these steps:

1. If you want to use our properly protected materials in your productive artificial intelligence programs, get the bride leave first.
2. Pay compensation to authors for the past and ongoing uses of our work in your productive artificial intelligence programs.
3. Content provided by artificial intelligence systems, whether or not they violate the existing laws, if our works are used in the results of artificial intelligence, compensate the authors fairly.

This issue did not come to the agenda for the first time with the notification. Developed by OpenAI Chatgpt, GPT”Is trained with a language model called and this language model is obtained from many places. It is not known exactly where these positions are, but the data is among the data according to the cases filed recently.torrentEven the information obtained through. Famous comedian and writer behind these cases Sarah Silvermanas well as writers Christopher Golden And Richard Kadrey took place. Three names via both chatgpt OpenAIBoth “Lama” through the big language model MetaHe sued the ya over the copyright violation.

The basis of the case opened in the OpenAI was to summarize the books of the authors when Chatgpt was commanded. The authors say that this violates copyrights. In a separate case against Meta, it is stated that the authors’ books are accessible in the data sets used in the training of the LLAM language model. Within the scope of both cases, the authors are for the artificial intelligence models of the companies that are protected by copyrights. that they do not allow it to be used as a training material. He said that three names requested legal compensation and refund within the scope of the process.

lgct-tech-game