Rules and regulations that govern our air, water, food, and the products in our homes should be based on the best available scientific evidence.
EPA, however, is using a faulty systematic review method that can exclude critical evidence and have negative consequences for public health.
One vital step in a systematic review is to assess the risk of bias of individual studies. Risk of bias is a measure of internal validity based on considering features of how the study was designed and conducted and whether these features might influence the findings of the study to be more positive or more negative. We evaluate factors such as how well the study measured the exposure to a chemical. Some systematic review methods also include a measure of overall study quality when assessing the risk of bias.
In the clinical sciences, unlike environmental health, there is a general consensus on the majority of the domains (but not all) that make up the tool to assess risk of bias in randomized controlled trial (RCT) studies. In plain language that means that it has been scientifically shown that when you fail to implement certain study design, such as blinding the study personnel when they evaluate the outcome of the study (in other words, they know which participants received the intervention and which ones didn’t) you will get an over estimate of the efficacy of the intervention.
While numerous systematic review methods have been developed by US regulatory agencies, in environmental health there is still no agreed upon tool for evaluating the risk of bias for observational human studies. Thus we set out to study the implications of applying the different methods available to assess risk of bias. In our study, we examined how three systematic review methods compare and how they could lead to different conclusions, which could have important policy implications as regulators often use systematic reviews when determining the toxicity of a chemical.
The tools we compared are the following:
- PRHE’s Navigation Guide is designed to assess risk of bias and the interval validity of a study. It does not use an overall rating system and is derived from the same constructs as the tool used in clinical
- Office of Health Assessment and Translation (OHAT) tool also assesses risk of bias and the interval validity and uses the same construct as the tools used in clinical sciences.
- Integrated Risk Information System (IRIS) tool is used by EPA, uses an overall rating system, and is used for study evaluation.
- Toxic Substances Control Act (TSCA) tool covers topics relating to data quality, such as risk of bias. TSCA uses quantitative scores, mixes reporting with risk of bias and uses an arbitrary rating system. We have written extensively on the scientific faults of this tool in our blog.
We compared and assessed the risk of bias methods in the tools using 15 studies that were previously included in a systematic review using PRHE’s Navigation Guide. Using the Navigation Guide to review the studies, scientists found that there is sufficient evidence supporting an association between polybrominated diphenyl ethers (PBDE) exposure and reduced IQ. Critically, the National Academies of Sciences found there was “no evidence of risk of bias in the assessment” and the committee used the Navigation Guide review as a basis for its own assessment. However, we found two of the other tools we applied to these studies would have prevented us from reaching this conclusion. So, what is the difference?
Across the three tools and the Navigation Guide, we found that the risk of bias is rated similarly as measured by some categories. However, we found that the tools varied in the way they assessed overall study quality as a risk of bias indicator, and ultimately that has implications for the overall body of evidence.
The IRIS and TSCA tools both included a measure of overall study quality; however, empirical studies have found that this is not supported. That means there is no scientific justification for giving a study an overall score/rating because we don’t how much each metric or domain should be weighted.
With the TSCA tool, all 15 studies were rated “unacceptable” in study quality because of one question regarding statistical power. However, power is not actually an indicator of risk of bias; it is reflective of how well a study’s results are reported. But, just because the authors may or not have reported the statistical power doesn’t mean the study is biased in one direction or the other.
IRIS also uses similar overall study quality ratings as the TSCA tool. Here, when applying the instructions exactly as written, all studies were rated “low” or “uninformative,” mostly due to questions about confounding and participant selection. So based on one possible risk of bias to the study’s results, the entire study was excluded. Again, there is no scientific justification for such an approach.
Following the instructions for the IRIS and TSCA tools, these poor study-quality ratings meant that the studies would be removed from the overall body of evidence. If we had chosen these tools to use in our systematic review examining the association between PBDEs and ADHD/IQ, there would have been no studies available to demonstrate how PBDES decrease IQ in children.
In contrast, overall study confidence was not included as a risk of bias indicator for the OHAT or Navigation Guide tools. Those tools considered all the possible sources of bias and the instructions state that each study should be evaluated considering its strengths and limitations. There are well-established methods that allow every study to be included in a systematic review, and for an analysis to be conducted to see if the study has a risk of bias that might actually influence the true results of the review.
Given the importance of systematic reviews in environmental health and their increasing popularity, we recommend that researchers use the Navigation Guide or OHAT tools when evaluating the risk of bias across studies. By considering the quality of the entire study and not making arbitrary decisions about a study’s validity based off only one metric or domain—the Navigation Guide and OHAT tools help to protect our children from harmful chemical exposures.
For more information on systematic review:
- Wolf in sheep’s clothing, part 1: EPA’s TSCA systematic review method, PRHE Blog
- Wolf in sheep’s clothing, part 2: How EPA’s TSCA systematic review method is threatening public health, PRHE Blog
About the author
Stephanie Eick, MPH, PhD is a reproductive and environmental epidemiologist and has been a postdoctoral scholar with the UCSF Program on Reproductive Health and the Environment since June 2019. She completed her PhD in Epidemiology from the University of Georgia in 2019 and an MPH in Epidemiology from Emory University in 2016.
Co-authors on the study are Dana Goin, Nicholas Chartres, Juleen Lam, Tracey Woodruff.
3 thoughts on “How EPA’s method for assessing study quality is designed to exclude critical evidence”
Comments are closed.