Data Mining in the U.S. National Toxicology Program (NTP) Database

Project leader:
Project members:
Start date: 1. March 2015
End date: 31. May 2015
Funding source:


Long-term studies in rodents are the benchmark method to assess carcinogenicity of single substances, mixtures, and multi-compounds. In such a study, mice and rats are exposed to a test agent at different dose levels for a period of two years and the incidence of neoplastic lesions is observed. However, this two-year study is also expensive, time-consuming, and burdensome to the experimental animals. Consequently, various alternatives have been proposed in the literature to assess carcinogenicity on basis of short-term studies.

In this project, we investigated if effects on the rodents’ liver weight in short-term studies can be exploited to predict the incidence of liver tumors in long-term studies. A set of 138 paired short- and long-term studies was compiled from the database of the U.S. National Toxicology Program (NTP), more precisely, from (long-term) two-year carcinogenicity studies and their preceding (short-term) dose finding studies. In this set, data mining methods revealed patterns that can predict the incidence of liver tumors with accuracies of over 80%. However, the results simultaneously indicated a potential bias regarding liver tumors in two-year NTP studies. The incidence of liver tumors does not only depend on the test agent but also on other confounding factors in the study design, e.g., species, sex, type of substance.

We recommend considering this bias if the hazard or risk of a test agent is assessed on basis of a NTP carcinogenicity study.