‘The night is dark and full of terrors’:
or the despair of reading a scientific publication
The famous quote from Game of Thrones perfectly describes my emotions when reading a modern scientific paper. This is particularly the case if I try to dig deep and evaluate whether the evidence is valid or reproducible.
I recently read a paper (published in Science) describing the creation of patient derived cellular models from cancer tissue. My aim was to assess how well the authors:
a) recreated gastrointestinal cancer tissues (cancer markers, tissue markers, morphology and treatment response)
b) proved their claims were reproducible
I consider myself a seasoned veteran of reading and writing scientific reports, however I found myself struggling to read the paper and forcing myself to identify all the relevant information. I complained loudly to colleagues about the difficulty of my task and the lengthy time it was taking.
The paper, selected at random, was around 7 pages long and had 43 authors from ten research centres. The main text contained four figures; but each contained up to seven sub-figures. There was no methods section; the trend in high impact journals is to relegate this to the supplementary materials. The supplementary materials were a whopping 47 pages including 13 additional figures (each with multiple sub-figures) and eight, large Excel data sheets. There were multiple complex methodologies including; advanced tissue culture, histology, DNA sequencing and mutational analysis. You can see the scale of the task. Maybe you can understand why I made the comparison with the futility of warring against the army of the undead.
It took me a determined day to read through all the data and find all the evidence relevant to my evaluation (or confirm the lack of it).
After completing my analysis (posted in previous blogs and tweets), I found that
a) The authors had selective reporting. They didn’t report all the evidence for all their models. It wasn’t even clear if they were only interested in reporting about particular models. This leaves you with the impression that the authors only report the results they want you to see.
b) The models were not well validated (morphology was not always consistent with the patient, there was limited evidence for colon or tumour markers, mutations were not always identical to the patient)
c) Response to treatment was very opaque. To clarify, the patient response was only mentioned in the text and it was not always clear what that response was. The model’s response to treatment was presented as a relative effect – so the absolute results could not be judged. In addition, there were no statistical analyses to prove a treatment effect nor that it was concordant with the patient. The reader had to trust the stated text and could not confirm it from clearly reported quantitative results.
d) Sadly (but not unusually) the data were frequently based on three replicates; this is not robust since the likelihood of the result repeating will be very small based on this sample size. Ideally the authors should have calculated a samples size, based on the effect size seen in the patients.
Overall, I was left with the impression of selective reporting, opacity of the absolute data and too much relative data or categorical data. I don’t think a statistician would have approved this data for publication; assuming they could read the papers’ complexity.
Basic standards of statistical reporting and the ease of reading appear to have been replaced by increasing data volume and the inclusion of many complex methodologies; this cannot be good for the scientific community. We need to remind ourselves of the reason that scientists publish: to pass on knowledge. Scientific knowledge must be reported in a manner that allows it to be understood, repeated and clearly evaluated by other scientists.
Other researchers have also suggested that scientific papers are becoming too dense, too complex and too hard to read. I felt that the current research would be more useful to the community if the evidence had been reported in multiple, shorter articles which focus on one question and report the results clearly using absolute values. It is the clarity of the evidence that is most disturbing, anyone who may want to invest in the models should be able to evaluate reproducibility and effect size. Investors should not assume the authors conclusions are correct without this evidence. QED Biomedical uses in house tools to evaluate the validity of new preclinical models and can assess the likelihood that treatment responses are reproducible.
Poor reporting of statistical analysis in preclinical organoid models for drug testing
Our recent evaluation of a scientific publication comparing the response of preclinical models to anticancer treatment with the reponse in patients, showed a shocking lack of statistical reporting and rigour.
Models were tested in triplicate for a response to cetuximab. No sample size calculation was performed. It is highly unlikely that three replicates was sufficient to detect any significant or reproducible effects of cetuximab on organoid growth. Statistical guidance has suggested that a sample size of at least 10 replicates should be applied to preclinical models. The lack of sample size calculations is one of the greatest causes of false findings, yet it remains one of the most overlooked reporting criteria in biomedical science.
To compound the issue reported above, the authors reported treatment effect as a relative value, which means the absolute values for the response to the drug treatment or to controls could not be evaluated. In addition, the authors provided no statistical analysis to prove that the drug had an effect (in comparison to the control) nor did they provide statistical evidence that the organoids replicated the patients' responses. The reader was expected to take it on good faith from narrative descriptions.
Why should you care? Organoids have been proclaimed as the 'method of the year 'and are part of a growing industry for preclinical testing of drugs. There are supporting industries in bioinformatics helping researchers find the best models for their needs. We must ensure that no matter how complex the chosen model, the basic tests of reproducibility should be reported alongside the discoveries and the limitations should be acknowledged. This is what will ensure that biomedical discoveries translate efficiently to the clinic. Precision medicine needs to be precise.
QED Biomedical is helping to tackle the reproducibility crisis. We can help you design your experiments better or we can help you identify the strengths and weaknesses in existing research.
QED Biomedical blog posted by CATAPULT Medicines Discovery
New methods to improve the translation of biomedical science towards clinical trial success
Moving forward - implementing systematic review and identifying reliable evidence
An article from our series: Views from the Discovery Nation
It is now over ten years since John Ioannidis published his essay on ‘Why most published findings are false’. This paper presented statistical arguments to remind the research community that a finding is less likely to be true if: 1) study size is small, 2) effect size is small, 3) a protocol is not pre-determined and not adhered to, 4) it is derived from a single research team. Methods to overcome this issue included establishing the correct study size, reproducing the results by multiple teams and improving the reporting standards of scientific journals.
So, have any of these suggestions been taken forward? There is an increase in the promotion of open science, a movement which supports increased rigour, accountability and reproducibility of research. In addition, there are increased requirements by some scientific journals to improve reporting standards. Whether or not these recommendations are followed or adhered to remains to be established. Evidence from a systematic review we conducted on the asymmetric inheritance of cellular organelles highlights the problems in basic science reporting and study design.
Of 31 studies (published between 1991 and 2015), not one performed calculations to determine sample size, 16% did not report technical repeats and 81% did not report intra-assay repeats. We need to educate our future scientists on study design and impose stricter publication criteria.
Impartial and robust evaluations of biomedical science are required to determine which new biomedical discoveries will be clinically predictive. We should be concerned by the lack of reproducibility in biomedical science because it is a major contributing factor in the low rate of clinical trial success (only 9.6% of phase 1 studies reach approval stage). Our ability to judge which discoveries will have real-life effects is crucial.
What can be learned from clinical research?
A lot can be learnt from clinical research. The publication of clinical research must follow strict reporting guidelines to ensure the transparency of research and decision making is based on unbiased evidence gathered by systematic review. Systematic reviews provide a methodology to identify all research evidence relating to a given research question.
Importantly systematic reviews are unbiased (not opinion-based) and are carried out according to strict guidelines to ensure the clarity of the results and their reproducibility.
They form the basis of all decision making in clinical research. Importantly the evidence gathered by the review is judged to ascertain its quality. This allows the review to present results graded according to the ‘best evidence’. Determining the quality of the clinical evidence is judged primarily according to the study design (randomisation, blinding of participants and research personnel etc). There are many risk of bias tools available, each appropriate to a different study design. The GRADE seeks to incorporate the findings of a systematic review alongside study size, imprecision, consistency of evidence and indirectness; providing a clear summary of the strength of evidence to guide recommendations for decision making.
It makes sense for decision making in biomedical research to be judged in a similar open and unbiased manner. Astoundingly, the choice of which biomedical discovery is suitable for further investment is usually an opinion-based decision. Big steps have been made to introduce systematic review to preclinical research. There are reporting guidelines for animal studies and risk of bias tools. The quality of these studies is again based on study design.
At a basic bioscientific level one must argue that judging the quality of evidence should focus on more fundamental aspects of the research: 1) how valid is the chosen model (or how well does it recapitulate the human disease of interest), 2) how valid is the chosen marker (or how well does it identify the target?) 3) how reliable is the result. Pioneering work by Collins and Lang has introduced tools to perform such judgments. These tools aim to directly address the issues raised by John Ioannidis. They aim to highlight the strengths and gaps in a given research base.
Original blog can be found here https://md.catapult.org.uk/new-methods-to-improve-the-translation-of-biomedical-science-towards-clinical-trial-success/