Wolf in sheep’s clothing, part 2: How EPA’s TSCA systematic review method is threatening public health

UPDATE: EPA is considering dropping TSCA systematic review’s study scoring system, according to Inside EPA. Deputy Director of the Risk Assessment Division within EPA’s toxics office, Stan Barone, criticized the numeric scoring system during an August 24 meeting of the National Academies of Sciences, Engineering and Medicine (NASEM) committee that is peer reviewing EPA’s systematic review process. Inside EPA also reports on UCSF PRHE’s critiques of the scoring system, a method that has led the agency to disregard or downgrade studies inappropriately. Read the full story for details.

In “Wolf in sheep’s clothing, part 1: EPA’s TSCA systematic review method,” we explain how scientists use systematic review methods to minimize bias and apply more transparent and consistent approaches when evaluating an entire body of evidence to answer a specific research question.

EPA could have used such methods like the Navigation Guide Method endorsed by the National Academies of Sciences Engineering, and Medicine (NASEM) and adopted by the World Health Organization and International Labor Organization or a similar one established by the National Toxicology Program (NTP). Instead, EPA created an entirely new TSCA systematic review method that has never been seen nor tested, and does not comply with the steps required to minimize bias and provide more valid and reliable scientific results. Thus, the science will not be evaluated appropriately, and this could lead to underestimating risks and jeopardizing public health.

Using three examples from EPA’s completed draft risk evaluations, we highlight how these flawed systematic review methods have led EPA to underestimate the health risks of the top ten potentially harmful chemicals.

  1. EPA’s TSCA method failed to provide sufficient information to describe protocol development and EPA has failed to publish a protocol for the first 10 chemicals that have undergone draft risk evaluations.

A protocol minimizes bias and ensures transparency by pre-defining the questions that will be answered, the eligibility criteria for studies to be included, how the studies will be evaluated, and how the evidence will be synthesized in a review—it guides the entire research process.

Think of it like the recipe for your systematic review.

Therefore, it is the first and most critical step in a systematic review, yet EPA has not used a protocol in assessing any of the first ten chemicals, nor does it intend to use a protocol in the evaluation of the next 20 chemicals.

How a lack of protocol influenced the way EPA evaluated evidence is illustrated by how EPA failed to predefine the study eligibility criteria it applied to the references for every draft risk evaluation in the form of PECO statements (what populations, exposures, comparators, and outcomes the studies must contain to be included in the review).

The PECO statement must be pre-defined in the protocol before starting a review because it allows for the unbiased and transparent inclusion and exclusion of studies based on the PECO criteria and not  a study’s results. For example, it stops review authors from selecting studies that may support their pre-established conclusions on the harms of a chemical exposure.

In fact, the Institute of Medicine (IOM) (which has 21 standards covering the entire systematic review process that, if adhered to, result in a scientifically valid, transparent, and reproducible systematic review) states that:

Using prespecified inclusion and exclusion criteria to choose studies is the best way to minimize the risk of researcher biases influencing the ultimate results of the SR.

Using the Carbon Tetrachloride draft risk evaluation as an example, EPA published both the literature and screening strategy and the results of the title and abstract screening of the literature in June of 2017. EPA then stated that they conducted the full text screening to further exclude references that were not relevant to the draft risk evaluation using a PECO framework; however, this PECO framework was not published until May 2018, almost a year after the searches and initial screening had already been published.

The timing of this is very concerning as the PECO framework could have been developed to include/exclude studies that would support a pre-defined health hazard conclusion.

As it stands, without a pre-established protocol, EPA’s screening and inclusion of references in every draft risk evaluation indicates a lack of scientific expertise at best or an intentional effort to bias the evaluation results at worst.

  1. The literature review in EPA’s TSCA method lacks best practices for conducting a systematic and transparent literature review.

We found EPA’s TSCA method to be consistent with only 17 of the IOM’s 27 best practices for conducting a literature search. One feature of EPA’s framework that was inconsistent with IOM’s best practices was:

Document the disposition of each report identified, including reasons for their exclusion if appropriate (IOM 3.4.2).

In other words, make sure all of the studies included at the start of your review are accounted for through the entire review process; i.e., studies should not just disappear.

Figure 1
Figure 1. Literature Flow Diagram for Human Health Hazard in TCE Draft Risk Evaluation

Therefore, the inconsistency in the Trichloroethylene (TCE) Draft Risk Evaluation between the number of studies EPA stated it included in the data evaluation, 180, and the number of studies it actually evaluated in its data evaluation files is worrying (Figure 1). The Data Quality Evaluation of Human Health Hazard Studies for Animal and Mechanistic Data includes 119 studies that EPA evaluated, and the Data Quality Evaluation of Human Health Hazard Studies for Epidemiological Data includes 96 Epidemiological studies that EPA evaluated.

So, 119 + 96 = 215 NOT 180

Therefore, 35 studies just went missing from the TCE evaluation without any explanation—such inconsistencies are concerning and threaten the validity of the draft risk evaluations.

  1. EPA’s TSCA method utilizes a quantitative scoring method that is incompatible with the best available science and excludes relevant studies from consideration in the risk evaluations based on arbitrary metrics not related to real flaws in the underlying research.

First, EPA used a quantitative scoring method that assigns various weights to quality metrics and then sums up the scores to decide whether a study is of “high,” “medium,” or “low” quality; rather than assessing the risk-of-bias for each metric.

Second, EPA also created an arbitrary list of quality metrics and rating system that make studies “unacceptable for use.” In human epidemiological studies 14 of the 22 metrics of quality (see Table 1 below) can be scored as unacceptable due to a “serious flaw,” often based on just one reporting or methodological limitation

There is no scientific justification for EPA to assign these quantitative scores or for EPA’s selected list of “serious flaws.”

Third, EPA’s “serious flaws” are not all related to real flaws in the underlying research. For example, how well a study was reported does not indicate how well the study was conducted, nor does it necessarily influence the study results.

If you are scratching your head wondering what that means, let me explain.

An example of this is statistical power (the likelihood that a study will detect an effect), which EPA can mark as a “serious flaw.” However, reporting statistical power does not reflect the quality of the research.

For example, a small study can be underpowered but at the same time less biased than a larger study. In addition, a small “underpowered” study can be combined in a meta-analysis (where similar studies are combined to get a more powerful effect) that consequently increases its statistical power to reflect the relationship between an exposure and a health impact.

Table 1
Table 1. EPA’s list of 14 metrics (out of 20 total) that make studies “unacceptable for use in the hazard assessment,” shown in “Updates to the Data Quality Criteria for Epidemiological Studies” in the Draft Risk Evaluation for Perchloroethylene. Note that metrics 3, 4, 6, and 7 are evaluated using reporting guidelines that are not related to real flaws in the underlying research.

In the Draft Risk Evaluation for Perchloroethylene EPA excluded 10 studies because of “unacceptable ratings”— 5 due to how well a study has been reported (metric 4, which is evaluated using reporting guidelines not related to real flaws) and 3 due to statistical power (metric 13). EPA has therefore excluded valuable evidence from this evaluation based on considerations that are not related to real flaws in the underlying research. Thus, EPA could be excluding studies that would help inform and identify health risks from these chemical exposures.

With the public’s health at stake, we are deeply concerned by EPA’s inadequate TSCA method, which is inconsistent with current, established, and best available science for systematic review. Thus, continued use of this method would mean that risks from industrial chemicals and pollutants could be undervalued and underestimated—leaving the public’s health at risk.

READ MORE on systematic review:
Wolf in sheep’s clothing: EPA’s TSCA systematic review method, part one
Using shoddy methods, EPA says chemical is not risky
ROBINS-E: Good studies gone bad