vignettes/pivot-concept-relationships.Rmd
pivot-concept-relationships.Rmd
conn <- connectAthena()
Pivoting the Concept Relationship table can provide a rich array representing the various relationships an OMOP Concept may have.
To exemplify, a test dataset of RxNorm and RxNorm Extension concepts are retrieved.
test_data <-
get_test_drug_concepts(conn = conn)
#> [2021-05-26 16:10:33] Overriding cache... Querying Athena...
#> [2021-05-26 16:10:33] ✓ Open connection
#> [2021-05-26 16:10:33] ✓ JDBC connection
#> [2021-05-26 16:10:33] SQL: SELECT * FROM omop_vocabulary.concept c WHERE c.invalid_reason IS NULL AND c.vocabulary_id IN ('RxNorm', 'RxNorm Extension') AND c.standard_concept <> 'C' AND c.domain_id = 'Drug' ORDER BY RANDOM() LIMIT 100
#> [2021-05-26 16:10:33] Querying...
#> [2021-05-26 16:10:36] Querying...complete
#> [2021-05-26 16:10:36] ✓ Returned data has more than 0 rows
#> [2021-05-26 16:10:36] Caching resultset...
test_data
#> # A tibble: 100 x 10
#> concept_id concept_name domain_id vocabulary_id concept_class_id
#> <int> <chr> <chr> <chr> <chr>
#> 1 36264635 atorvastatin Oral Table… Drug RxNorm Extens… Branded Drug Fo…
#> 2 35764325 Domperidone 10 MG Delay… Drug RxNorm Extens… Marketed Product
#> 3 2061001 ferric hydroxide Oral S… Drug RxNorm Extens… Branded Drug Fo…
#> 4 41260353 Glucosamine 250 MG Oral… Drug RxNorm Extens… Marketed Product
#> 5 21081445 Amoxicillin 250 MG / Cl… Drug RxNorm Extens… Marketed Product
#> 6 41480145 15000 MG Betamethasone … Drug RxNorm Extens… Quant Branded D…
#> 7 41429074 150 ML benzenesulfonic … Drug RxNorm Extens… Quant Clinical …
#> 8 1594614 sarilumab 175 MG/ML Pre… Drug RxNorm Clinical Drug
#> 9 43779151 Centaurium erythraea pr… Drug RxNorm Extens… Clinical Drug F…
#> 10 21143236 Atenolol 100 MG Oral Ta… Drug RxNorm Extens… Marketed Product
#> # … with 90 more rows, and 5 more variables: standard_concept <chr>,
#> # concept_code <chr>, valid_start_date <date>, valid_end_date <date>,
#> # invalid_reason <chr>
The relationships of these concepts are derived.
test_data_relationships <- lookup_relationships(test_data$concept_id,
conn = conn)
test_data_relationships
#> # A tibble: 927 x 21
#> relationship_id concept_id_1 concept_name_1 domain_id_1 vocabulary_id_1
#> <chr> <int> <chr> <chr> <chr>
#> 1 Marketed form of 36261375 lamotrigine 100 MG… Drug RxNorm Extensi…
#> 2 RxNorm has dose… 36261375 lamotrigine 100 MG… Drug RxNorm Extensi…
#> 3 Mapped from 36261375 lamotrigine 100 MG… Drug RxNorm Extensi…
#> 4 Maps to 36261375 lamotrigine 100 MG… Drug RxNorm Extensi…
#> 5 RxNorm has dose… 36268722 drospirenone 3 MG … Drug RxNorm Extensi…
#> 6 Has supplier 36268722 drospirenone 3 MG … Drug RxNorm Extensi…
#> 7 Mapped from 36268722 drospirenone 3 MG … Drug RxNorm Extensi…
#> 8 Maps to 36268722 drospirenone 3 MG … Drug RxNorm Extensi…
#> 9 Marketed form of 36268722 drospirenone 3 MG … Drug RxNorm Extensi…
#> 10 RxNorm has dose… 36409773 1 ML Heroin 0.5 MG… Drug RxNorm Extensi…
#> # … with 917 more rows, and 16 more variables: concept_class_id_1 <chr>,
#> # standard_concept_1 <chr>, concept_code_1 <chr>, valid_start_date_1 <date>,
#> # valid_end_date_1 <date>, invalid_reason_1 <chr>, concept_id_2 <int>,
#> # concept_name_2 <chr>, domain_id_2 <chr>, vocabulary_id_2 <chr>,
#> # concept_class_id_2 <chr>, standard_concept_2 <chr>, concept_code_2 <chr>,
#> # valid_start_date_2 <date>, valid_end_date_2 <date>, invalid_reason_2 <chr>
To properly pivot the resultset, the concepts on both sides require transformation into a single cell per concept. This is achieved by converting concepts into the strip
format.
test_data_relationships2 <-
test_data_relationships %>%
merge_strip(into = "concept_1",
suffix = "_1") %>%
merge_strip(into = "concept_2",
suffix = "_2")
test_data_relationships2
#> # A tibble: 927 x 5
#> concept_2_id concept_2 concept_1_id concept_1 relationship_id
#> <int> <chr> <int> <chr> <chr>
#> 1 19003128 [V] [S] 19003128… 36261375 [V] [S] 36261375… Marketed form …
#> 2 19082076 [V] [N] 19082076… 36261375 [V] [S] 36261375… RxNorm has dos…
#> 3 36261375 [V] [S] 36261375… 36261375 [V] [S] 36261375… Mapped from
#> 4 36261375 [V] [S] 36261375… 36261375 [V] [S] 36261375… Maps to
#> 5 19001949 [V] [N] 19001949… 36268722 [V] [S] 36268722… RxNorm has dos…
#> 6 21019390 [V] [N] 21019390… 36268722 [V] [S] 36268722… Has supplier
#> 7 36268722 [V] [S] 36268722… 36268722 [V] [S] 36268722… Mapped from
#> 8 36268722 [V] [S] 36268722… 36268722 [V] [S] 36268722… Maps to
#> 9 36270728 [V] [S] 36270728… 36268722 [V] [S] 36268722… Marketed form …
#> 10 19082103 [V] [N] 19082103… 36409773 [V] [S] 36409773… RxNorm has dos…
#> # … with 917 more rows
The dataset can now be pivoted by the relationship, hinging on concept_1
, which are the concepts that were originally inputted.
output <-
test_data_relationships2 %>%
pivot_wider(id_cols = concept_1,
names_from = relationship_id,
values_from = concept_2)
#> Warning: Values are not uniquely identified; output will contain list-cols.
#> * Use `values_fn = list` to suppress this warning.
#> * Use `values_fn = length` to identify where the duplicates arise
#> * Use `values_fn = {summary_fun}` to summarise duplicates
output
#> # A tibble: 100 x 26
#> concept_1 `Marketed form o… `RxNorm has dos… `Mapped from` `Maps to`
#> <chr> <list> <list> <list> <list>
#> 1 [V] [S] 36261375 … <chr [4]> <chr [1]> <chr [3]> <chr [1]>
#> 2 [V] [S] 36268722 … <chr [2]> <chr [1]> <chr [1]> <chr [1]>
#> 3 [V] [S] 36409773 … <NULL> <chr [1]> <chr [4]> <chr [1]>
#> 4 [V] [S] 40241175 … <NULL> <chr [1]> <chr [10]> <chr [1]>
#> 5 [V] [S] 40880992 … <NULL> <chr [1]> <chr [1]> <chr [1]>
#> 6 [V] [S] 40968650 … <NULL> <chr [1]> <chr [1]> <chr [1]>
#> 7 [V] [S] 41186279 … <NULL> <chr [1]> <chr [1]> <chr [1]>
#> 8 [V] [S] 41209582 … <NULL> <chr [1]> <chr [1]> <chr [1]>
#> 9 [V] [S] 41246120 … <NULL> <chr [1]> <chr [1]> <chr [1]>
#> 10 [V] [S] 41316281 … <NULL> <chr [1]> <chr [1]> <chr [1]>
#> # … with 90 more rows, and 21 more variables: Has supplier <list>,
#> # Has marketed form <list>, Concept replaces <list>, RxNorm is a <list>,
#> # Box of <list>, Tradename of <list>, Available as box <list>,
#> # RxNorm has ing <list>, Has tradename <list>, Has brand name <list>,
#> # Has quantified form <list>, RxNorm inverse is a <list>, Constitutes <list>,
#> # RxNorm - VAProd eq <list>, RxNorm - SPL <list>, Quantified form of <list>,
#> # Contained in <list>, Consists of <list>, Drug has drug class <list>,
#> # RxNorm - ATC <list>, RxNorm - SNOMED eq <list>
Since at times a concept_1
may have a specific relationship to more than 1 concept_2
, a list-cols are returned in the output. To create a more readable output, an aggregate function on concept_2
must be supplied to convert multiple concept_2
values into a string of length 1.
output <-
test_data_relationships2 %>%
pivot_wider(id_cols = concept_1,
names_from = relationship_id,
values_from = concept_2,
# Add aggregate function
values_fn = list(concept_2 = ~ paste(unique(.), collapse = "|"))
)
output
#> # A tibble: 100 x 26
#> concept_1 `Marketed form of` `RxNorm has dose … `Mapped from` `Maps to`
#> <chr> <chr> <chr> <chr> <chr>
#> 1 [V] [S] 362… [V] [S] 19003128 … [V] [N] 19082076 … [V] [S] 36261… [V] [S] 36…
#> 2 [V] [S] 362… [V] [S] 36270728 … [V] [N] 19001949 … [V] [S] 36268… [V] [S] 36…
#> 3 [V] [S] 364… <NA> [V] [N] 19082103 … [V] [S] 36409… [V] [S] 36…
#> 4 [V] [S] 402… <NA> [V] [N] 19082168 … [U] [N] 19123… [V] [S] 40…
#> 5 [V] [S] 408… <NA> [V] [N] 19082573 … [V] [S] 40880… [V] [S] 40…
#> 6 [V] [S] 409… <NA> [V] [N] 19082573 … [V] [S] 40968… [V] [S] 40…
#> 7 [V] [S] 411… <NA> [V] [N] 19082229 … [V] [S] 41186… [V] [S] 41…
#> 8 [V] [S] 412… <NA> [V] [N] 19082573 … [V] [S] 41209… [V] [S] 41…
#> 9 [V] [S] 412… <NA> [V] [N] 19082168 … [V] [S] 41246… [V] [S] 41…
#> 10 [V] [S] 413… <NA> [V] [N] 19082103 … [V] [S] 41316… [V] [S] 41…
#> # … with 90 more rows, and 21 more variables: Has supplier <chr>,
#> # Has marketed form <chr>, Concept replaces <chr>, RxNorm is a <chr>,
#> # Box of <chr>, Tradename of <chr>, Available as box <chr>,
#> # RxNorm has ing <chr>, Has tradename <chr>, Has brand name <chr>,
#> # Has quantified form <chr>, RxNorm inverse is a <chr>, Constitutes <chr>,
#> # RxNorm - VAProd eq <chr>, RxNorm - SPL <chr>, Quantified form of <chr>,
#> # Contained in <chr>, Consists of <chr>, Drug has drug class <chr>,
#> # RxNorm - ATC <chr>, RxNorm - SNOMED eq <chr>
The resulting dataset contains a row for each original concept provided with each relationship and concept/s it has the relationship to in a pipe-separated string.