The technology behind ChatGPT scored better at assessing eye problems and providing advice than non-specialist doctors, a new study has found.
A study led by the University of Cambridge has found that GPT-4, the large language model (LLM) developed by OpenAI, performed nearly as well as specialist eye doctors in a written multiple-choice test.
The AI model, which is known for generating text based on the vast amount of data it is trained on, was tested against doctors at different stages of their careers, including junior doctors without a specialism, as well as trainee and expert eye doctors.
Each group was presented with dozens of scenarios where patients have a specific eye problem, and asked to give a diagnosis or advise on treatment by selecting from one of four options.
The test was based on written questions, taken from a textbook used to test trainee eye doctors, about a range of eye problems – including sensitivity to light, decreased vision, lesions, and itchy eyes.
The textbook on which the questions are based is not publicly available, so researchers believe it is unlikely the large language model has been trained on its contents.
GPT-4 scored significantly higher than junior doctors, whose level of specialism is comparable to general practitioners, at the test.
The model achieved similar scores to trainee and expert eye doctors, but it was beaten by the top-performing experts.
The research was conducted last year using the latest available large language models.
The study also tested GPT-3.5, an earlier version of OpenAI’s model, Google’s PaLM2, and Meta’s LLaMA on the same set of questions. GPT-4 gave more accurate responses than any of the other models.
The researchers have said that large language models will not replace doctors, but they could improve the healthcare system and reduce waiting lists by supporting doctors to deliver care to more patients in the same amount of time.
Dr Arun Thirunavukarasu, the lead author of the paper, said: “If we had models that could deliver care of a similar standard to that delivered by humans, that would help overcome the problems of NHS waiting lists.
“What that requires is trials to make sure it’s a safe and effective model. But if it is, it could be revolutionary for how care is delivered.”
He added: “While the study doesn’t indicate deployment of LLMs in clinical work immediately, it gives a green light to start developing LLM-based clinical tools as the knowledge and reasoning of these models compared well to the expert ophthalmologists.”
 
						
									



















 
								
				
				
			 
							 
							 
							 
				 
				