The AI software ChatGPT was able to score at or close to the 60% passing grade needed for the United States Medical Licensing Exam (USMLE), according to new US research.
The USMLE is a notoriously difficult series of three exams that are required to get a medical license in the US, which cover most medical disciplines. ChatGPT scored between 52.4% and 75% across the three exams, according to the team, who add ChatGPT provided answers that made coherent, internal sense and frequently contained insights.
We asked experts to comment on the research.
Dr Simon McCallum, Senior Lecturer in Software Engineering, Te Heranga Waka, Victoria University of Wellington, comments:
“This particular study was conducted in the first few weeks of ChatGPT becoming available. There have been three updates since November with the latest on January 30th. These updates have improved the ability of the AI to answer the sorts of questions in the medical exam.
“Google has developed a Large Language Model (the broad category of tools like ChatGPT) called Med-PaLM, which ‘performs encouragingly on the axes of our pilot human evaluation framework.’ Med-PaLM is a specialisation of Flan-PaLM, a system released by Google that is similar to ChatGPT, trained on general instructions. Med-PaLM focused its learning on medical text and conversations. ‘For example, a panel of clinicians judged only 61.9% of Flan-PaLM long-form answers to be aligned with scientific consensus, compared to 92.6% for Med-PaLM answers, on par with clinician-generated answers (92.9%). Similarly, 29.7% of Flan-PaLM answers were rated as potentially leading to harmful outcomes, in contrast with 5.8% for Med-PaLM, comparable with clinician-generated answers (6.5%).’
“Thus, ChatGPT may pass the exam, but Med-PaLM is able to give advice to patients that is as good as a professional GP. And both of these systems are improving.
“ChatGPT is also good at simplifying content so that individuals can understand medical jargon or complex instructions. Asking the AI to simplify until the language used fits the needs of the patient will change people’s ability to understand medical advice and removes the potential embarrassment associated with saying you do not understand.
“Within university education we are having to pivot almost as fast as at the start of the pandemic to account for the ability of AI to perform tasks which were traditionally a sign of understanding. There is also a massive cultural shift when everybody has access to a tool that can assist in written communication. Careers and jobs which were seen as difficult, may be automated by these AI tools. Microsoft has announced that ChatGPT is now integrated into MS Team Professional and will act as a meeting secretary, summarising meetings and creating action items. Bing will also include a ChatGPT advancement linking the version 4 of ChatGPT with up-to-date search information.”
“Society is about to change, and instead of warning about the hypochondria of randomly searching the internet for symptoms, we may soon get our medical advice from Doctor Google or Nurse Bing.”
Conflict of Interest statement: “I am an active member of the Labour Party (Taieri LEC Chair). I am leading Te Heranga Waka Victoria University of Wellington’s response to AI tools.”
Expertise and background: “I have a PhD in Computer Science (in Neural Networks like those used in ChatGPT ) from the University of Otago. I have been teaching using Github Copilot last year. Copilot uses the same GTP model as ChatGPT but was focused on programming languages rather than human languages. My research has been in Games for Health and Games for Education, where AIs in games have been part of the tools integrated into research. I have also applied ChatGPT to many of our courses and it passes first year courses and some of our second year courses as of December, and may do even better now.”
Dr Collin Bjork, Senior Lecturer in Science Communication and Podcasting at Massey University, comments:
“The claim that ChatGPT can pass US medical exams is overblown and should come with a lengthy series of asterisks. Like ChatGPT itself, this research article is a dog and pony show designed to generate more hype than substance.
“OpenAI had much to gain by releasing a free open-access version of ChatGPT in late 2022 and fomenting a media fervor around the world. Now, OpenAI is predicting 1 billion in revenue in 2024, even as a ‘capped-profit’ company.
“Similarly, the authors of this article have much to gain by releasing a free open-access version of their article claiming that ChatGPT can pass the US Medical Licensing Exams. All of the authors but one work for Ansible Health, ‘an early stage venture-backed healthcare startup’ based in the Silicon Valley. At two years old, this tiny company will likely need to go back to their venture capitalist investors soon to ask for more money. And the media splash from this well-timed journal article will certainly help fund their next round of growth. After all, a pre-print of this article already went viral on social media because the researchers listed ChatGPT as an author. But the removal of ChatGPT from the list of authors in the final article indicates that this too was just a publicity stunt.
“As for the article itself, the findings are not as straightforward as the press release indicates. Here’s one example:
“The authors claim that ‘ChatGPT produced at least one significant insight in 88.9% of all responses’ (8). But their definition of ‘insight’ as ‘novelty, nonobviousness, and validity’ (7) is too vague to be useful. Furthermore, the authors insist that these ‘insights’ indicate that ChatGPT ‘possesses the partial ability to teach medicine by surfacing novel and nonobvious concepts that may not be in the learner’s sphere of awareness’ (10). But how can an unaware learner distinguish between true and false insights, especially when ChatGPT only offers ‘accurate’ answers on the USMLE a little more than half the time?
“The authors’ claims about ChatGPT’s insights and teaching potential are misleading and naive.”
No conflicts of interest.