Run Scrape

Run the complete scrape based on resource.

cg_run()

Run CancerGov Scrape and Store

cdp_run()

Run ChemiDPlus on all HemOnc Drugs

pm_run()

Run the complete PubMed Scrape

Scrape Cancer.gov Drug Dictionary

Scrape and store the parsed data from the Cancer.gov Drug Dictionary

drug_count()

Get the Drug Count in the Drug Dictionary

log_drug_count()

Log the Drug Count in the Drug Dictionary

get_dictionary_and_links()

Scrape the Drug Definitions and Links from the NCI Drug Dictionary

get_drug_link_synonym()

Get the Synonyms found at a given Drug Link

get_drug_link_url()

Get the URLs found in a Drug Link

get_ncit_synonym()

Scrape the NCI Thesaurus

process_drug_link_ncit()

Process the NCIt CUI from the Drug Link URL Table

process_drug_link_synonym()

Process the Links found in the Drug Link Table for Synonyms

process_drug_link_url()

Process the Links found in the Drug Link Table for NCIt and other URLs

update_cancergov_drugs()

Update the Cancergov Drugs Table

Scrape ChemiDPlus

Scrape and store the parsed data from ChemiDPlus search results

get_registry_numbers()

Scrape the "Registry Numbers" Section at a Registry Number URL

log_registry_number()

Log Registry Number Matches for a Search

get_rn_url_validity()

Check that the Registry Number URL is Valid

get_classification_code()

Scrape the Classification Code in the Summary Header of the RN URL

get_classification()

Scrape the "Classification" Section at a Registry Number URL

get_links_to_resources()

Scrape the "Links to Resources" Section at a Registry Number URL

get_names_and_synonyms()

Scrape the "Names and Synonyms" Section at a Registry Number URL

Scrape PubMed

Scrape and store the parsed data from PubMed search results

get_pm()

Scrape PubMed Publications

get_pm_earliest()

Get Earliest PubMed Publications

get_pm_latest()

Get Latest PubMed Publications

start_pm()

Create PubMed Tables in Patelm9 Schema

Methods

Methods that can be used against the scraped data

lookup_ncit_code()

Lookup an NCIt Code