Where did all the evidence go?

One of the key issues that the scientific community, EPA’s own Scientific Advisory Committee on Chemicals (SACC), and the National Academies of Sciences, Engineering, and Medicine (NASEM) had with EPA’s original systematic review method (TSCA Method) to evaluate chemical risks under the Toxic Substances Control Act (TSCA), was how EPA evaluated the quality of the evidence.

How EPA evaluates evidence quality influences EPA’s ability to protect the most vulnerable populations from dangerous chemicals like asbestos and formaldehyde. EPA’s TSCA method’s quality evaluation can erroneously exclude evidence which can lead to underestimating a chemical’s true risks.

The NASEM thought this was inappropriate and recommended EPA no longer use such an approach. But EPA hasn’t followed that advice.

Wrongly Excluding Studies  

A critical step in systematic review to evaluate the quality of the evidence is to assess the risk of bias.

In the original TSCA Method, EPA created a seemingly arbitrary list of quality metrics and rating system that made studies “unacceptable for use” often based on just one reporting or methodological limitation. This quality metrics and rating system are not based on scientific best practices in the field.

Additionally, these “risks” of bias domains only indicate whether that feature might influence the study findings, it does not mean that this limitation will lead to an invalid result and therefore these studies should not be excluded. Best practice is to identify studies with methodological limitations and then test whether including/using these studies will influence the overall evidence findings. If they do, then we know that the studies with limitations may be driving the overall result, and we may choose to report a result that does not include them.

The NASEM agreed that a study should not be excluded from evaluation based on one limitation. When other EPA offices have excluded studies due to one limitation, they can discard up to half of the evidence.

Thus, NASEM recommended that EPA “Do not exclude studies based on risk of bias, study quality, or reporting quality.

EPA says it dropped its previous approach to excluding studies, but as we highlight, in its 2021 Draft TSCA Method EPA states that a “critically deficient” rating in any metric make the study unusable for quantitative analysis” and in the fine print (Table_Apx Q-9), EPA states that for the majority of metrics in animal toxicity studies, a “critically deficient” rating “makes the study unusable.”

So, EPA has not followed the NASEM’s recommendation to “not exclude studies based on risk of bias, study quality, or reporting quality.”

Quantitative Scoring

In its TSCA Method, EPA used a quantitative scoring that assigned various weights to quality metrics and then summed up the scores to decide whether a study was of “high,” “medium,” or “low” quality, rather than assess the risk of bias for each metric.

Evidence shows that this approach can falsely imply a relationship between scores (i.e. high vs low) and the study result or reliability of a study because we don’t know how much each metric or domain should be weighted; therefore the use of only “high” and “medium” quality studies based off of an arbitrary score can lead to a biased body of evidence. There is no scientific justification for using quantitative scores.

This is one reason why the NASEM recommended EPA “not use numeric scores to evaluate studies.”

While the 2021 Draft TSCA Method states, “In response to a variety of commenters, including NASEM and SACC, the TSCA SR Protocol does not include a quantitative/weighted scoring system for data evaluation,” this is false.

As shown in Table Apx R-4 to calculate an overall quality score, EPA assigns a numerical rating of 1 to 3 for each metric, with no calculation if a study receives a 4 (critically deficient) for any metric. EPA then sums up the metric scores and divides it by the total number of metrics in the tool to determine the overall quality score for the study. EPA has assigned arbitrary cut offs to designate studies ‘Low’, ‘Medium’ or ‘High’ quality.

So, EPA has disregarded the NASEM and continued with an almost identical scoring method as used in the original TSCA method.

Inappropriate Appraisal Criteria

Characteristics of a high-quality study do not influence the overall direction and size of the result, like a bias can, which NASEM highlighted in its TSCA Method report.

For example, as NASEM states, “Statistical power and statistical significance are not markers of risk of bias or quality.” In fact, combining multiple small low-powered but similar studies in a synthesis is one of the potential benefits of systematic review.

That is why the NASEM recommended EPA “Use established tools for assessing risk of bias and study quality such as those developed for use by OHAT or the Navigation Guide, or, at a minimum, remove inappropriate appraisal criteria from the current tools.”  

However, despite these clear NASEM recommendations, EPA continues to use statistical power as a study quality metric in Table Apx R-7. Evaluation Criteria for Epidemiological Studies (Metric 13).

So, EPA has not followed the NASEM recommendation to “remove inappropriate appraisal criteria from the current tools.”

To ensure unbiased, accurate chemical risk evaluations, PRHE strongly recommends EPA follow NASEM’s recommendations on evaluating study quality.