All ChemiDPlus Scraping Functions operate on a Registry Number URL (rn_url). The initial search is logged to a "REGISTRY_NUMBER_LOG" Table. If the RN URL is then tested for 404 Status and logged to the "RN_URL_VALIDITY" Table. The major sections found at the ChemiDPlus site are: "Names and Synonyms", "Classification", "Registry Numbers", "Links to Resources" with these sections are written to their respective tables "NAMES_AND_SYNONYMS", "CLASSIFICATION", "REGISTRY_NUMBERS", and "LINKS_TO_RESOURCES".

log_registry_number(
  conn,
  raw_search_term,
  search_type = "contains",
  sleep_time = 3,
  schema = "chemidplus",
  verbose = TRUE
)

Arguments

conn

Postgres connection object

raw_search_term

Character string of length 1 to be searched in ChemiDPlus

sleep_time

If the response argument is missing, the number seconds to pause after reading the URL, Default: 3

schema

Schema that the returned data is written to, Default: 'chemidplus'

type

Type of search available at ChemiDPlus, Default: "contains"

Value

Each section is parsed by a respective skyscraper function that stores the scraped results in a table of the same name in a schema. If a connection argument is not provided, the results are returned as a dataframe in the R console.

Registry Number Log Table

The REGISTRY_NUMBER_LOG Table is the landing table for any ChemiDPlus searches using skyscrape. It is the place where a source concept is searched based on a given set of parameters and all the possible Registry Numbers (RN) that source concept can be associated with in ChemiDPlus. The Registry Number then serves as a jump-off point from where a second RN URL is read and split based on the sections, and read into their corresponding ChemiDPlus Tables.

The Table logs the Raw Concept, the processed version of the Concept (ie removed spaces and error-throwing characters to generate a valid search URL for the Concept), the type of search (ie equals or contains), and the final search URL used to read a search result. A series of booleans are performed to determine whether the search was performed (ie a response was received), and if the results were for any records, and if these records were saved. If an RN was found, it is included along with the full URL associated with the URL.

See also