A central online news resource for professionals involved in the development of medical publications and involved in publication planning and medical writing.
Spotting fake images in scientific research: insights from science integrity consultant Elisabeth Bik
Many of us will be familiar with the concept of plagiarised text as a form of misconduct within scientific literature, but perhaps a lesser-known problem, and one which most of us would find much harder to spot, is the publication of manipulated images. Elisabeth Bik is a science integrity consultant who has been described as a super-spotter or image sleuth due to her unique talent for identifying scientific photos that have been tampered with. Elisabeth strives to tackle the issue of scientific misconduct and has a blog dedicated to the topic of science integrity. To date, her scientific detective skills have led to 951 retractions, 122 expressions of concern, and 956 corrections. The Publication Plan spoke to Elisabeth to find out more about her work.
Could you tell us how and why you became involved in investigating fraudulent scientific work and how you discovered your talent for spotting duplicated/manipulated images?
“In 2013 I heard about plagiarism so I took a sentence that I had written and put in into Google Scholar to see if anybody had used my text. I had not expected any results, but by chance the sentence that I had picked randomly, had been stolen by somebody else, so I found a paper that had plagiarised my text, and that of many others. I subsequently kept on finding more and more papers that had plagiarised other people’s work. I worked on that for about a year whilst I was working full-time at Stanford, so it was a kind of weekend project. Then in around 2014 I came across a PhD thesis, not one that had stolen my work but one that had plagiarized text, and one that also contained images – western blots. A couple of the figures had panels that had been reused, so the same panel had been used to represent different experiments. The panel had a very distinctive shape and so I realised that I had some talent for spotting these things, and started searching for other papers with similar image issues.”
What do you look for when analysing images, and what are the most common issues you encounter?
“I look for photos specifically because they contain a lot of information, much more than a line graph”.
“I look for photos specifically because they contain a lot of information, much more than a line graph. A line graph could be duplicated but it is very hard to remember, as it’s just a line. Whereas there are features in photos that you can remember at least for a short period, so I compare photos within scientific papers. Because I mainly focus on photos of blots or gels, or microscopy photos of tissues and cells, those are typically the types of images where I find issues, but sometimes I work on photos of plants or mice, visible objects that don’t require a microscope. Occasionally I will find a plot that has been duplicated but as I said plots are hard to find so I don’t focus on those. I look for duplications. There are three main duplication problems: two panels that have been duplicated; two panels that have been duplicated and shifted so that they sort of overlap; and duplication of elements within a photo, for example a group of cells might be visible multiple times. Occasionally I will also find evidence suggestive of tampering with a photo, for example you might see a different background around one particular band in a gel, which indicates that it did not originate from that photo. This example is not a duplication but a sign of potential tampering – that parts of the photo came from somewhere else.”
How common and widespread is the problem of duplicated/manipulated images within the scientific literature and what are the potential consequences of such images going unidentified?
“Duplications are found in around 4% of papers that contain at least one photo. This finding is based on a systematic search I performed for papers that contain the term ‘western blot’ to enrich for papers with molecular biology photos or other figures. In the resulting set of papers, I scanned 20,000, and I found around 800 to contain duplications, so that’s 4% of papers. Those contained one of the three types of duplication I listed, which could result from an honest error or could have been intentionally duplicated with an intention to mislead the reader. The first case, an honest error in a photo, is usually not a big problem. In my opinion it should be corrected, but we all make errors in papers, and so that’s the least concerning. But when images are duplicated with overlaps, or are rotated or stretched, or contain duplicated elements within the same photo, that’s clearly a manipulation of the data. To me those are visible signs of manipulation which cast doubt over all the data in that paper, because if one image has been potentially tampered with or manipulated then so might have other types of data, which are much harder to catch. For example, you cannot really see if values in a table have been fabricated or manipulated so it makes the whole paper less reliable and maybe also other works by those same authors. In some cases, images are manipulated to make the data look better. If a photo contains duplicated elements, then you can’t even be sure that the experiment happened and what the results were. Duplications within the same photo are very suggestive of an intention to mislead and that the results were not obtained as they have been presented. Such fraud in my opinion goes against everything that science should be – science should be about finding the truth and fraud is the opposite of that.”
“Fraud in my opinion goes against everything that science should be – science should be about finding the truth and fraud is the opposite of that.”
What proportion of questionable images do you think could result from honest error and how many are likely to be deliberate acts of misconduct?
“In the study I referred to previously, where I found 800 of 20,000 papers to contain duplicated figures, we estimated that about half of the duplications were deliberate. It is sometimes difficult to know whether a duplication is deliberate in an individual paper, but because we had 800, that was our best guess. It was based on there being roughly an equal distribution of papers over the three duplication categories, so 30% in each category. Since overlapping images could result from honest error, we estimated that about half of the 800 papers had deliberately duplicated or manipulated photos, so 2% of papers overall. Of course the real percentage of manipulation might be much higher because at least photos leave traces if you manipulate them, but as I said, manipulation in other types of data, such as tables or line graphs is much harder to detect so the real percentage of papers with misconduct might be much higher than 2%.”
What systems do journals have in place, if any, to identify problematic images before publication and what are the limitations of these systems?
“Some journals scan all incoming papers for image duplications and others have traditionally hired people like me who can spot these duplications, to scan all their accepted papers for image problems. This might only take a couple of minutes per paper so it’s really not a huge time investment if you know what to look for. After I raised my concerns about 4% of papers having image problems, some other journals upped their game and have hired people to look for these things. This is still mainly being done I believe by humans, but there is now software on the market that is being tested by some publishers to screen all incoming manuscripts. The software will search for duplications but can also search for duplicated elements of photos against a database of many papers, so it’s not just screening within a paper or across two papers or so, but it is working with a database to potentially find many more examples of duplications. I believe one of the software packages that is being tested is Proofig. I have never worked with this software so I don’t know exactly what it does or how good it is, but I would love to test it. Although there have been situations where an editor has informed me that Proofig didn’t find any evidence of a duplication or any evidence of tampering with an image in which I can clearly see a problem. So I think there is a danger if an editor doesn’t really know how to use the software or just blindly relies on the software’s verdict.”
What kind of response do you tend to get from journal editors when you report a potential issue in one of the papers they have published? Your work has resulted in numerous retractions and corrections – is that a common result when you notify a journal of an issue?
“In the past no response was common – I would just not hear anything. Nowadays I specifically write in my email that I keep track of which journals respond to my message, so I usually receive a notification or acknowledgement of receipt or something like that, but then very often I still hear nothing. I reported that initial set of 800 papers in which I found problems to the journals in roughly 2015, and kept track of what happened – two-thirds of those papers have not been retracted after 5 years, some are still being retracted so the number is steadily going down, but around 60% of papers have not been addressed. For the more current papers that I’ve reported, that number is slightly better with half not being addressed after waiting a year or two, but the majority are still not addressed. I get an acknowledgement of receipt but then it seems that nothing happens. When an issue is addressed, the two most common outcomes are a correction or a retraction, which each account for roughly half of cases. There is also a tool called expression of concern, which is very rarely used but I feel should be used more because it provides a very fast way for an editor to flag that they have been alerted to a big problem with the paper and are investigating it, so readers know to proceed with caution if they read that paper. As mentioned, corrections and retractions are the most common outcomes but they are only used in about 40 to 50% of cases – for the majority there is still no outcome after waiting a couple of years.
“Corrections and retractions are the most common outcomes but they are only used in about 40 to 50% of cases – for the majority there is still no outcome after waiting a couple of years.”
But I do feel that the situation is improving, maybe my work has finally earnt some acknowledgement that I’m signalling for positive reasons, not out of malice. In the past I have felt I’ve been ignored a little bit more and I go to social media sometimes too to vent about the lack of response from journals, which I feel has helped so the numbers are getting better but I feel that journals can still do a much better job.”
How important do you think websites such as PubPeer, Retraction Watch and your own blog, Science Integrity Digest, are in creating transparency and raising awareness of possible flawed research? Does the creation of such sites indicate an increasing problem or a greater awareness of the need to check the integrity of science?
“I don’t want to talk about my own blog too much, but I do feel that PubPeer and Retraction Watch have played a huge role in openness about problems in papers. There is no other good website where you can report problems. You may try writing privately to a journal, or sometimes there are comments sections in journals, but very often these comments disappear after a while or they never come out of moderation. I feel PubPeer does a really good job in alerting people that there might be a problem with a paper and it’s the only platform that I know of that we can use. Retraction Watch offers a glimpse of what happens once a paper gets retracted because they provide the background to a retraction. In many cases a retraction notice is very vague, simply stating that the authors or editors decided to retract the paper because of a problem without indicating what the problem was, which is not fair for the reader because parts of the paper may still be good. We want to know why the paper was retracted and what the specific problem was. Retraction Watch go into a little bit more detail, they interview people – the scientists, the authors, the editors – and ask them for their side of the story. Sometimes you learn that a retraction was actually a very good thing because an author found, for example a big problem with their paper due to a mistake in a formula, so they did the right thing in retracting their own paper. To hear people talk about why they retracted a paper is very useful and gives you a lot more information. I feel both Retraction Watch and PubPeer create transparency as a lot of these cases are otherwise hidden by the journals or institutions.
As to whether it is an increasing problem, I do believe it is for several reasons. First, papers are getting more and more complex, which provides more opportunities to fake data. Digital photography also means it is much easier to digitally alter a photo than it used to be – when I did my PhD you would still bring your gel to the photographer, there was no digital photography and subsequent Photoshopping. Another reason is the increasing pressure to publish. Certain countries have really increased their pressure to publish and made it mandatory to publish for example, a paper when you finish your Master’s degree or to publish multiple papers when you finish your PhD, or in medical school you need to publish a paper to get a promotion. China in particular has issued a lot of these mandatory publication demands. In some cases they are impossible to fulfil as people do not have the time to do the research, but of course they still want to get a promotion or a position at a hospital so they might just buy a paper. Therefore, there is this whole growing market of papermills, which are companies that mass produce papers. There are different models but they basically sell fake papers to authors who need them, which was not a problem that existed 20 years ago. If you look at papers from 30 years ago I’m sure there was fraud but those papers usually only contained one figure and one table, so there were fewer opportunities to commit fraud compared with papers today that have 6 to 8 figures and additional supplementary figures. Although I feel that this is an increasing problem, I believe that there is also a greater awareness of the issue”
What more could be done to improve research integrity within the scientific literature? How do you think the research integrity landscape will have changed in 5 years?
“I hope there is more emphasis on reproducibility in the future because I feel reproducibility is the only way for us to know that an experiment has really been performed and yielded the reported results.”
“I hope there is more emphasis on reproducibility in the future because I feel reproducibility is the only way for us to know that an experiment has really been performed and yielded the reported results. I hope we have less emphasis on output – measuring a scientist’s output by measuring numbers of papers or impact factor – to remove some of that pressure and instead reward reproducibility. Reproducing a study may not be novel and of course there is not a lot of funding for it, but I feel it gives so much more validity to a study than trying to do something new. Pre-registration of clinical trials is a wonderful thing as it requires people to publish their results even if they are negative, which I feel might result in less cheating. I’m also very worried about artificial intelligence (AI) and its potential to create fake papers and images. We’ve seen several examples of what technology can do right now, if you think about dinosaurs in movies, they look more and more real every year, so I think in the next 5 years AI is going to be a huge problem for scientific publishing, because it might generate fake photos, data and text. Distinguishing what is real and what is fake, which may be impossible in 5 years from now, will be a problem for journalists too. We need to think about how we can prove that images, photos or other data are real. The obvious errors that we currently use to determine that a paper is probably faked can be overcome by a very smart fraudster – they can make their images look very realistic and AI is going to help them tremendously, so I’m very worried about that. I’m not quite sure if we can safeguard the integrity of science with the ever-increasing amount of pressure that we put on scientists and the advantages that digital photography and AI can offer fraudsters and so I’m a bit pessimistic there, but I hope we have more funding to look into solutions, technical solutions for that. Some of that is solvable – we can maybe look at original images, and ways of proving that they really came from a microscope for example, and were not generated by AI. I’m not quite sure how, that goes beyond my technical comprehension of the issue, but there are hopefully ways to solve that.”
Elisabeth Bik is a science integrity consultant. You can contact Elisabeth via LinkedIn.