When ChatGPT got the famous cognitive reflection test all wrong

David Haziza Why ChatGPT should be treated as an enemy

In a few weeks, the conversational artificial intelligence ChatGPT ignited all imaginations and generated a significant number of comments. Everyone tried to test it and I was no exception to the rule. Like everyone else, I was initially amazed by the quality of the expression and the arguments developed in our exchanges. The desire to see how to trap this conversational agent came to me by observing the question that an account on a social network had asked him. “Quentin’s mother has three children: Huey, Fifi and…?” ChatGPT to answer: “Loulou” and not Quentin as he should have. I then wondered how he would react if I submitted him to the Cognitive Reflexion Test (CRT)? This famous test was developed by psychologist Shane Frederick in 2005 to assess an individual’s tendency to surrender to forms of intuitive but faulty reasoning.

For example, faced with the following problem: “A bat and a baseball together are worth 110 euros. Knowing that the bat costs 100 euros more than the ball, how much does the ball cost?” you may be tempted to answer “10 euros” when this is completely false. The correct answer, 5 euros, is counter-intuitive. This is the first CRT problem which has 3. The second being: “If it takes 5 machines 5 minutes to make 5 items, how long would it take 100 machines to make 100 items?” Here, you should not answer 100 minutes but 5. As for the last problem, here it is: “In a lake, there is a water lily. Every day, its size doubles. If it takes 48 days for the water lily to cover everything the lake, how long would it take to cover half of it?” Again, beware of the answer “24 days” because it is after 47 days that this water lily will actually cover half of the lake. When I submitted these problems to him, the artificial intelligence answered respectively: “10 euros”, “one minute” and “23.5 days”! All false. The result is fun, but not as interesting as one might think. Indeed, ChatGPT was not designed to solve puzzles at all. And, contrary to what one might think, he doesn’t even really answer questions.

“Hallucinations”

We feel, inspired by the design of the interface, that we are having a kind of conversation with him. But that is not at all what is happening. In reality, it is a tool that only predicts words and their sequence from a context. The texts he writes represent just a credible completion from a probabilistic point of view: it is therefore not a question of an answer strictly speaking to our questions. The model that comes closest to it in our current uses is that of our mobile phone when it offers us the next word of a text that we are writing. Of course, ChatGPT differs from it by the formidable power of the database which feeds its mathematical model. It is this operation that explains why this conversational agent can have what some call “hallucinations”: in reality things that are perfectly invented (like false quotes, for example) but credible from a certain point of view.

This is one of the major dangers of this formidable tool: creating a misunderstanding between what it can actually do and the interpretation that its users will have of it. These will inevitably project a kind of anthropomorphism on the machine which risks making them forget that, under the pen of ChatGPT, the true and the false can be intimately linked from the moment their expression is credible from the point of view of the probabilistic sequence of words.

The user, who will have the impression of exchanging with a kind of conscience that thinks formidably quickly and most often true, risks establishing, as he would with one of his fellows, a bond of trust. But ChatGPT, if it seems to answer all our questions, never gives sources concerning its answers. This incredible tool – the next version of which should be available in the year would be 10,000 times more powerful! – risks constituting an additional stage in the battle that the contemporary world is waging against the notion of truth. We dare not think, for example, of the harm that its use would be likely to do to the profession of journalist, which is economically in crisis and is already tried to use it.

lep-life-health-03