R/cdp-internals.R
get_registry_numbers.RdAll ChemiDPlus Scraping Functions operate on a Registry Number URL (rn_url). The initial search is logged to a "REGISTRY_NUMBER_LOG" Table. If the RN URL is then tested for 404 Status and logged to the "RN_URL_VALIDITY" Table. The major sections found at the ChemiDPlus site are: "Names and Synonyms", "Classification", "Registry Numbers", "Links to Resources" with these sections are written to their respective tables "NAMES_AND_SYNONYMS", "CLASSIFICATION", "REGISTRY_NUMBERS", and "LINKS_TO_RESOURCES".
get_registry_numbers( conn, rn_url, response, schema = schema, sleep_time = 3, verbose = TRUE )
| conn | Postgres connection object |
|---|---|
| rn_url | Registry number URL to read that also serves as an Identifier |
| response | (optional) "xml_document" "xml_node" class object returned by xml2::read_html for the |
| schema | Schema that the returned data is written to, Default: 'chemidplus' |
| sleep_time | If the response argument is missing, the number seconds to pause after reading the URL, Default: 3 |
Each section is parsed by a respective skyscraper function that stores the scraped results in a table of the same name in a schema. If a connection argument is not provided, the results are returned as a dataframe in the R console.
The "Registry Numbers" Section contains other identifiers for the given drug at other Agencies.
read_xml
lsSchema,createSchema,lsTables,query,buildQuery,appendTable,writeTable
html_nodes,html_text
str_remove
strsplit
mutate,bind,mutate_all,distinct
map2,set_names,map
as_tibble
Other chemidplus scraping:
get_classification(),
get_links_to_resources(),
get_names_and_synonyms(),
get_rn_url_validity(),
log_registry_number()