Run the full sequence that scrapes, parses, and stores the NCI Drug Dictionary found at CancerGov.org and any correlates to the NCI Thesaurus in a Postgres Database.

process_drug_link_synonym(
  conn,
  sleep_time = 3,
  expiration_days = 30,
  verbose = TRUE,
  render_sql = TRUE,
  encoding = "",
  options = c("RECOVER", "NOERROR", "NOBLANKS")
)

Arguments

conn

Postgres connection object.

sleep_time

Time in seconds for the system to sleep before each scrape with read_html.

verbose

When reading from a slow connection, this prints some output on every iteration so you know its working.

The links to Drug Pages are scraped from the Data Dictionary URL over the maximum page number and are saved to a Drug Link Table in the cancergov schema. The URLs in the Drug Link Table are then scraped for any HTML Tables of synonyms and the results are written to a Drug Link Synonym Table. The links to active clinical trials and NCIt mappings are also derived and stored in their respective tables.

See also