An artificial intelligence capable of answering almost all of your questions, or of listening to a speech and summarizing it… The latest version of Chat-GPT presented this week pushes the fascinating feats of this generative AI even further. A technology that promises to disrupt many sectors, from financial analysis to advertising, including research, advertising and even medicine. In this area, Google also unveiled this week the latest progress of its Med-Palm algorithm, a competitor to Chat-GPT specialized in health. He now obtains 85% success in the final exams of American medical studies.
However, despite the buzz of recent years, it is clear that with rare exceptions, AI has not yet brought radical changes or spectacular advances in medical practice. And for good reason, most of the AIs that have made headlines in recent years have not yet proven their value through rigorous scientific studies. A step that is unavoidable, but which promises to be difficult to take, as Professor Jean-Emmanuel Bibault, radiotherapist at the Georges Pompidou European Hospital and researcher at Inserm in the field of AI applied to medicine, reminds us, in his recent book 2041: the medical odyssey (Ecuadors). Interview.
L’Express: The promises of artificial intelligence are vast, but they are still struggling to translate into clinical practice…
Professor Jean-Emmanuel Bibault: Many projects have been “oversold” in recent years – think IBM’s Watson which claimed to help treat cancer before failing… So far the only two areas where AI algorithms are routinely used there remains radiology, to help with the reading of images, and radiotherapy. Deep learning technologies make it possible to “contour” the areas to be irradiated in two to three minutes, when it could take a doctor half a day. At this stage, however, it is not yet validated for all locations.
AI will no doubt also soon be used for anatomopathology, where it is a question of identifying abnormal cells on biopsy sections. As with radiology or radiotherapy, these remain medical tasks based on image analysis in the broad sense. In a second step, we will see the arrival of AI oriented towards prediction: it will be a question of anticipating the evolution of a patient’s pathology, in order to adapt care accordingly. The impressive progress presented by Google and by OpenAI, as we have seen again in recent days, makes these developments quite inevitable. But their implementation in medical practice will take time, because it will be much more difficult to assess the performance and non-dangerousness of algorithms in this area.
What is already used in clinical routine is well validated?
Assessing assistive tools is not as hard to do as predictive and decision tools because with these applications there is always human supervision. With the care personalization and prediction systems that will arrive in the next few years, it will be much more complicated, because doctors cannot tell in advance whether a patient will recover or not. We will need clinical trials where, as with drugs, we will take a cohort of patients treated according to the recommendations of the AI, and another treatment according to a strategy defined by a human, and we will see in the following years which will show the best results.
““We need to be sure that the assessment has been well conducted””
This might be very long…
Everything will depend on the illnesses. For lung cancers, which unfortunately evolve quickly, we will not need much hindsight. In two years, we will already have a good idea of the effectiveness of these devices. On the other hand, for slowly growing prostate tumors, it will actually take longer.
As always, we are caught between the needs and expectations of patients on the one hand, and the need not to use tools that are potentially dangerous because they are poorly evaluated. After all, an AI that is used to make a medical decision could do less well than a human. We must therefore be certain that the evaluation was well conducted.
In the United States, some diagnostic aid AIs are already on the market. Have they been well evaluated?
Not really. The validation of medical devices depends on the level of danger of their use: a new dressing will be level 1, while software intended to diagnose lung cancer will rather be level 4. The FDA has authorized around fifteen AIs from level 4, for the diagnosis of cardiac disorders, cerebral hemorrhages, or tumours, but almost none have been rigorously evaluated. This poses a problem, because devices are marketed without knowing whether they are effective and safe.
The very evaluation of these medical devices remains a stammering field…
Scientists begin to publish methodological guides. The general idea is that there should be three stages. First, an “in silico” validation: the algorithm is run on an existing data set, the result of which is known. Then there is a first study “in real life”, then finally within the framework of a therapeutic trial on a larger scale. This is similar to what is done for drugs, even if not everything is directly transposable because there are specific questions for AI.
Which ones?
Some algorithms can continuously learn from new cases. It’s really very specific: in this case, we’ll have to check that they don’t lose efficiency over time. Another example, it will not only be necessary to validate the algorithm, but also to ensure that the interfaces are well understood, so that it is used wisely.
But in fact, the question of evaluation remains largely unexplored, and very few studies include all the quality criteria that would be necessary. But they will arrive because little by little, the experts of the major scientific journals will ask for them. Of course, these criteria will also have to be approved, and adopted, by the health authorities responsible for authorizing medical devices. At this stage, most are just beginning to work on it.
What are the challenges for doctors?
Physicians should not reject these new technologies outright. On the contrary, our involvement is essential because it is also our responsibility to offer patients useful and safe treatments. We are the only ones to take the Hippocratic oath – primum non nocere, first do no harm. Practitioners and medical students will therefore have to be trained in order to be able to distinguish good algorithms from those that are useless, on the model of courses in biostatistics which help us to interpret studies on the effectiveness of drugs. In addition, many believe that hospitals will have artificial intelligence departments to continuously assess these tools, as they already do for scanners and MRIs. We are going to see a proliferation of new professions.
In your book, you warn of the risk of seeing doctors eventually lose their skills, by dint of delegating tasks to algorithms. Aren’t we still very far from it?
Not that much. Take “contouring” in radiotherapy: if we entrust this task to AI, we will know how to perform it less and less. In this case, how are we going to teach it to future doctors? And if no one is there to teach it, how will we do afterwards to check that the AI does not do anything? These prospects are proving quite dizzying… For now, these questions will only arise for a small number of medical procedures. But who can say where we will be in 10 or 15 years from now?