ROBINS-E: Good studies gone bad

Assessing environmental hazards often requires the evaluating a diverse body of evidence of varying quality. It is critical to consider the credibility of the individual studies used in the evaluation to reach conclusions through a consistent, transparent, and empirically-demonstrated methods such as those used in systematic review. The GRADE Working Group recently released one such tool, ROBINS-E (Risk Of Bias in Non-randomized Studies–of Exposure). Given the widespread recognition and acceptance of the GRADE approach, this tool has the potential for rapid and extensive uptake. But is it the right tool to use? We took it for a test drive to find out with environmental health studies and published one of the first user-based experiences with ROBINS-E.  While overall the tool has a number of strengths, we found several concerning limitations.

A little background: Slowly but surely, systematic review methods are gaining a stronghold in the environmental health. Not long ago, environmental health decision-making relied on “narrative reviews” where researchers search the scientific literature, cherry-pick interesting studies, count the number of “positive” versus “negative” studies, then ultimately say that “more research is necessary” without clear bottom-line conclusions about the scientific question. But we have since come a long way.

Systematic reviews using objective, transparent, and reproducible methods to evaluate the evidence are less biased and result in better, actionable information. Several frameworks (Table 1) are now well-known, demonstrated methods recommended by the National Academy of Sciences (NAS) as exemplary approaches to systematic review.

Table 1: Systematic Review Frameworks for Environmental Health

Systematic Review MethodScope
GRADE (Grading of
Recommendations Assessment, Development and Evaluation)
Assessment of internal validity for randomized and nonrandomized
studies of interventions
Navigation GuideAssessment of human and non-human animal evidence to develop
recommendations for preventing
harmful environmental exposures
OHAT (Office of Health Assessment and Translation)Assessment of evidence that
environmental chemicals, physical substance, or mixtures may cause
non-cancer health effects that
inform government determination
of whether these substances may be of concern

 One critical step in a systematic review is understanding each study’s internal validity or “risk of bias”—that is, are there flaws in the design, conduct, or analysis leading to biases that ultimately affect study results? Available tools vary in their approaches to evaluating risk of bias, likely because extension of decision criteria from the clinical sciences to environmental health is still in development.

This is where we found some problems with ROBINS-E. First, we all know by now that scoring is bad and shouldn’t be used as an evaluation of study quality, as it leads to biased and inaccurate results. Though ROBINS-E doesn’t explicitly incorporate scoring of studies, it actually kind of does, by incorporating a rating system that generates an overall rating based on the highest risk of bias rating for an individual domain. This is problematic because first, this assumes equal weighting of each domain when in fact there lacks empirical evidence supporting how each risk of bias domain should be weighted, whether equally or not.

Second, it fails to distinguish between a study that receive one to two “critical” risk of bias rating for a single domain (such as study 1 or study 2 in the table below) versus a study that receives multiple or even all “high” ratings across several domains (such as study 5 in the table below). Most would agree the second scenario is more problematic and likely results in a study of lower quality than the first—i.e., that study 5 is lower quality than studies 1 or 2. However, ROBINS-E rates the overall risk of bias as “critical” for all these studies, thus treating them as equivalent.

Table 2: Hypothetical Risk of Bias Ratings using ROBINS-E

Capture
ROBINS-E tool rates studies 1, 2, 4, and 5 as having the same overall risk of bias (critical), even though study 5 has critical risk of bias in 7/7 domains, studies 1 and 2 have critical risk of bias in only 2/7 domains, and study 4 has critical risk of bias in 2/7 and serious risk of bias in 4/7 domains. Only study 3, with low risk of bias in 7/7 domains, will be rated with low risk of bias overall.

While not technically quantitative scoring, the effect is the same – it encapsulates several fundamental flaws from scoring studies. Furthermore, it also makes it almost impossible for a study to receive a “low” overall risk of bias because the bar is set so high, even for a well-designed and implemented study.

ROBINS-E is also based on a comparison of observational studies against a hypothetical “ideal” randomized controlled study (RCT), considered the “gold standard” of all studies because of its protection against bias. However, this can easily result in high-quality observational studies being rated poorly simply because they cannot stack up to a hypothetical RCT. This would be like comparing my running times to those of Usain Bolt—of course, it would be nice to clock top speeds of 27.8 miles per hour, but if I compare all my running times to this unrealistic “gold standard,” I would get dejected pretty quickly! Instead, wouldn’t it make more sense to compare to something more analogous and realistic, like average runners in my age bracket? That same principle holds for evaluating observational study quality—they should be rated on the relevant merits related to the study question at hand, not compared to some unrealistic “gold standard” that could never be obtained. Furthermore, although Usain Bolt is undoubtedly the gold standard of running, in reality RCTs can have their own limitations—ethics, generalizability and small sample sizes being some examples.

These and other critical concerns led us to conclude that ROBINS-E is not an appropriate tool for evaluating risk of bias in observational studies of exposures. We identified many concerns with its approach to risk of bias assessment and ultimately its utility to systematic reviewers or policy-makers looking for results to inform public health decisions. Instead, we recommend existing tools already demonstrated extensively in case studies, such as the NTP OHAT and Navigation Guide. As systematic review continues to spread in the field of environmental health, it is critical to use tools that provide the best approaches evaluating the evidence fairly and result in answers supported by the evidence.


JuleenJuleen Lam, PhD, is an Assistant Professor in the Department of Health Sciences at California State University, East Bay. She has a joint research affiliation with the University of California, San Francisco. Dr. Lam’s research interests include environmental epidemiology, evaluation of population exposures to environmental contaminants, assessment and communication of environmental risks, and reproductive/developmental health. She has been involved in developing systematic review methods for environmental health data for several years and has been a pivotal part in implementing, publishing, and disseminating these approaches in both academic and government settings. She currently serves as a member of the US EPA’s Board of Scientific Counselors (BOSC) Chemical Safety for Sustainability Subcommittee through 2020 and is on the National Academy of Sciences (NAS) Committee to Review the DOD’s Approach to Deriving an Occupational Exposure Limit for TCE through September 2019.
Dr. Lam’s co-authors include Lisa Bero, Nicholas Chartres, Joanna Diong, Alice Fabbri, Davina Ghersi, Agnes Lau, Sally McDonald, Barbara Mintzes, Patrice Sutton, Jessica Louise Turton, and Tracey Woodruff.