- Language-generating artificial intelligence could have an empowering impact in science, but non-transparency and oversimplification of complex data could threaten scientific professionalism.
- Authors call on government bodies to enforce systematic regulation to help realise the potential of large language models in science.
Large language models (LLMs) are artificial intelligence algorithms that recognise, summarise, and generate human language from very large text-based datasets. LLMs could well empower scientists to draw information from big data; however, researchers from the University of Michigan are concerned that without appropriate regulation, LLMs could threaten scientific professionalism and intensify public distrust in science.
A recent report examined the potential social change brought about by LLMs. In a subsequent Nature Q&A, the report’s co-author, Professor Shobita Parthasarathy, described the impact of LLMs in the scientific disciplines. She highlighted the potential for LLMs to help large scientific publishers to automate aspects of peer review, generate scientific queries, and even evaluate results, but cautioned that without systematic regulation, LLMs could exacerbate existing inequalities and oversimplify complex data.
Without appropriate regulation, LLMs could threaten scientific professionalism and intensify public distrust in science.
Developers are not required to disclose the accuracy of an LLM, and the models’ processes are not transparent, meaning that users could be unaware that LLMs can make errors, include outdated information, and remove important nuances. Furthermore, readers are unable to distinguish LLM-generated text from human-generated text, thereby highlighting that the technology could be employed to distribute misinformation and generate fake scientific articles.
For the potential of LLMs to be realised in science, Prof Parthasarathy calls on government bodies to enforce transparency in their use, stipulating that those who develop LLMs should disclose the models’ processes and make clear where LLMs have been used to generate an output.