Pivot Concept Relationships • chariot

library(chariot)
library(tidyverse)

conn <- connectAthena()

Pivoting the Concept Relationship table can provide a rich array representing the various relationships an OMOP Concept may have.

To exemplify, a test dataset of RxNorm and RxNorm Extension concepts are retrieved.

test_data <- 
  get_test_drug_concepts(conn = conn)
#> [2021-05-26 16:10:33]    Overriding cache... Querying Athena...
#> [2021-05-26 16:10:33]    ✓ Open connection
#> [2021-05-26 16:10:33]    ✓ JDBC connection
#> [2021-05-26 16:10:33]    SQL: SELECT * FROM omop_vocabulary.concept c WHERE c.invalid_reason IS NULL AND c.vocabulary_id IN ('RxNorm', 'RxNorm Extension') AND c.standard_concept <> 'C' AND c.domain_id = 'Drug' ORDER BY RANDOM() LIMIT 100
#> [2021-05-26 16:10:33]    Querying...
#> [2021-05-26 16:10:36]    Querying...complete
#> [2021-05-26 16:10:36]    ✓ Returned data has more than 0 rows
#> [2021-05-26 16:10:36]    Caching resultset...

test_data
#> # A tibble: 100 x 10
#>    concept_id concept_name             domain_id vocabulary_id  concept_class_id
#>         <int> <chr>                    <chr>     <chr>          <chr>           
#>  1   36264635 atorvastatin Oral Table… Drug      RxNorm Extens… Branded Drug Fo…
#>  2   35764325 Domperidone 10 MG Delay… Drug      RxNorm Extens… Marketed Product
#>  3    2061001 ferric hydroxide Oral S… Drug      RxNorm Extens… Branded Drug Fo…
#>  4   41260353 Glucosamine 250 MG Oral… Drug      RxNorm Extens… Marketed Product
#>  5   21081445 Amoxicillin 250 MG / Cl… Drug      RxNorm Extens… Marketed Product
#>  6   41480145 15000 MG Betamethasone … Drug      RxNorm Extens… Quant Branded D…
#>  7   41429074 150 ML benzenesulfonic … Drug      RxNorm Extens… Quant Clinical …
#>  8    1594614 sarilumab 175 MG/ML Pre… Drug      RxNorm         Clinical Drug   
#>  9   43779151 Centaurium erythraea pr… Drug      RxNorm Extens… Clinical Drug F…
#> 10   21143236 Atenolol 100 MG Oral Ta… Drug      RxNorm Extens… Marketed Product
#> # … with 90 more rows, and 5 more variables: standard_concept <chr>,
#> #   concept_code <chr>, valid_start_date <date>, valid_end_date <date>,
#> #   invalid_reason <chr>

The relationships of these concepts are derived.

test_data_relationships <- lookup_relationships(test_data$concept_id,
                                                conn = conn)

test_data_relationships
#> # A tibble: 927 x 21
#>    relationship_id  concept_id_1 concept_name_1      domain_id_1 vocabulary_id_1
#>    <chr>                   <int> <chr>               <chr>       <chr>          
#>  1 Marketed form of     36261375 lamotrigine 100 MG… Drug        RxNorm Extensi…
#>  2 RxNorm has dose…     36261375 lamotrigine 100 MG… Drug        RxNorm Extensi…
#>  3 Mapped from          36261375 lamotrigine 100 MG… Drug        RxNorm Extensi…
#>  4 Maps to              36261375 lamotrigine 100 MG… Drug        RxNorm Extensi…
#>  5 RxNorm has dose…     36268722 drospirenone 3 MG … Drug        RxNorm Extensi…
#>  6 Has supplier         36268722 drospirenone 3 MG … Drug        RxNorm Extensi…
#>  7 Mapped from          36268722 drospirenone 3 MG … Drug        RxNorm Extensi…
#>  8 Maps to              36268722 drospirenone 3 MG … Drug        RxNorm Extensi…
#>  9 Marketed form of     36268722 drospirenone 3 MG … Drug        RxNorm Extensi…
#> 10 RxNorm has dose…     36409773 1 ML Heroin 0.5 MG… Drug        RxNorm Extensi…
#> # … with 917 more rows, and 16 more variables: concept_class_id_1 <chr>,
#> #   standard_concept_1 <chr>, concept_code_1 <chr>, valid_start_date_1 <date>,
#> #   valid_end_date_1 <date>, invalid_reason_1 <chr>, concept_id_2 <int>,
#> #   concept_name_2 <chr>, domain_id_2 <chr>, vocabulary_id_2 <chr>,
#> #   concept_class_id_2 <chr>, standard_concept_2 <chr>, concept_code_2 <chr>,
#> #   valid_start_date_2 <date>, valid_end_date_2 <date>, invalid_reason_2 <chr>

To properly pivot the resultset, the concepts on both sides require transformation into a single cell per concept. This is achieved by converting concepts into the strip format.

test_data_relationships2 <-
  test_data_relationships %>%
  merge_strip(into = "concept_1", 
              suffix = "_1") %>%
  merge_strip(into = "concept_2",
              suffix = "_2")

test_data_relationships2
#> # A tibble: 927 x 5
#>    concept_2_id concept_2         concept_1_id concept_1         relationship_id
#>           <int> <chr>                    <int> <chr>             <chr>          
#>  1     19003128 [V] [S] 19003128…     36261375 [V] [S] 36261375… Marketed form …
#>  2     19082076 [V] [N] 19082076…     36261375 [V] [S] 36261375… RxNorm has dos…
#>  3     36261375 [V] [S] 36261375…     36261375 [V] [S] 36261375… Mapped from    
#>  4     36261375 [V] [S] 36261375…     36261375 [V] [S] 36261375… Maps to        
#>  5     19001949 [V] [N] 19001949…     36268722 [V] [S] 36268722… RxNorm has dos…
#>  6     21019390 [V] [N] 21019390…     36268722 [V] [S] 36268722… Has supplier   
#>  7     36268722 [V] [S] 36268722…     36268722 [V] [S] 36268722… Mapped from    
#>  8     36268722 [V] [S] 36268722…     36268722 [V] [S] 36268722… Maps to        
#>  9     36270728 [V] [S] 36270728…     36268722 [V] [S] 36268722… Marketed form …
#> 10     19082103 [V] [N] 19082103…     36409773 [V] [S] 36409773… RxNorm has dos…
#> # … with 917 more rows

The dataset can now be pivoted by the relationship, hinging on concept_1, which are the concepts that were originally inputted.

output <-
  test_data_relationships2 %>%
  pivot_wider(id_cols = concept_1,
              names_from = relationship_id, 
              values_from = concept_2)
#> Warning: Values are not uniquely identified; output will contain list-cols.
#> * Use `values_fn = list` to suppress this warning.
#> * Use `values_fn = length` to identify where the duplicates arise
#> * Use `values_fn = {summary_fun}` to summarise duplicates
output
#> # A tibble: 100 x 26
#>    concept_1          `Marketed form o… `RxNorm has dos… `Mapped from` `Maps to`
#>    <chr>              <list>            <list>           <list>        <list>   
#>  1 [V] [S] 36261375 … <chr [4]>         <chr [1]>        <chr [3]>     <chr [1]>
#>  2 [V] [S] 36268722 … <chr [2]>         <chr [1]>        <chr [1]>     <chr [1]>
#>  3 [V] [S] 36409773 … <NULL>            <chr [1]>        <chr [4]>     <chr [1]>
#>  4 [V] [S] 40241175 … <NULL>            <chr [1]>        <chr [10]>    <chr [1]>
#>  5 [V] [S] 40880992 … <NULL>            <chr [1]>        <chr [1]>     <chr [1]>
#>  6 [V] [S] 40968650 … <NULL>            <chr [1]>        <chr [1]>     <chr [1]>
#>  7 [V] [S] 41186279 … <NULL>            <chr [1]>        <chr [1]>     <chr [1]>
#>  8 [V] [S] 41209582 … <NULL>            <chr [1]>        <chr [1]>     <chr [1]>
#>  9 [V] [S] 41246120 … <NULL>            <chr [1]>        <chr [1]>     <chr [1]>
#> 10 [V] [S] 41316281 … <NULL>            <chr [1]>        <chr [1]>     <chr [1]>
#> # … with 90 more rows, and 21 more variables: Has supplier <list>,
#> #   Has marketed form <list>, Concept replaces <list>, RxNorm is a <list>,
#> #   Box of <list>, Tradename of <list>, Available as box <list>,
#> #   RxNorm has ing <list>, Has tradename <list>, Has brand name <list>,
#> #   Has quantified form <list>, RxNorm inverse is a <list>, Constitutes <list>,
#> #   RxNorm - VAProd eq <list>, RxNorm - SPL <list>, Quantified form of <list>,
#> #   Contained in <list>, Consists of <list>, Drug has drug class <list>,
#> #   RxNorm - ATC <list>, RxNorm - SNOMED eq <list>

Since at times a concept_1 may have a specific relationship to more than 1 concept_2, a list-cols are returned in the output. To create a more readable output, an aggregate function on concept_2 must be supplied to convert multiple concept_2 values into a string of length 1.

output <-
  test_data_relationships2 %>%
  pivot_wider(id_cols = concept_1,
              names_from = relationship_id, 
              values_from = concept_2,
              # Add aggregate function
              values_fn = list(concept_2 = ~ paste(unique(.), collapse = "|"))
              )
output
#> # A tibble: 100 x 26
#>    concept_1    `Marketed form of` `RxNorm has dose … `Mapped from`  `Maps to`  
#>    <chr>        <chr>              <chr>              <chr>          <chr>      
#>  1 [V] [S] 362… [V] [S] 19003128 … [V] [N] 19082076 … [V] [S] 36261… [V] [S] 36…
#>  2 [V] [S] 362… [V] [S] 36270728 … [V] [N] 19001949 … [V] [S] 36268… [V] [S] 36…
#>  3 [V] [S] 364… <NA>               [V] [N] 19082103 … [V] [S] 36409… [V] [S] 36…
#>  4 [V] [S] 402… <NA>               [V] [N] 19082168 … [U] [N] 19123… [V] [S] 40…
#>  5 [V] [S] 408… <NA>               [V] [N] 19082573 … [V] [S] 40880… [V] [S] 40…
#>  6 [V] [S] 409… <NA>               [V] [N] 19082573 … [V] [S] 40968… [V] [S] 40…
#>  7 [V] [S] 411… <NA>               [V] [N] 19082229 … [V] [S] 41186… [V] [S] 41…
#>  8 [V] [S] 412… <NA>               [V] [N] 19082573 … [V] [S] 41209… [V] [S] 41…
#>  9 [V] [S] 412… <NA>               [V] [N] 19082168 … [V] [S] 41246… [V] [S] 41…
#> 10 [V] [S] 413… <NA>               [V] [N] 19082103 … [V] [S] 41316… [V] [S] 41…
#> # … with 90 more rows, and 21 more variables: Has supplier <chr>,
#> #   Has marketed form <chr>, Concept replaces <chr>, RxNorm is a <chr>,
#> #   Box of <chr>, Tradename of <chr>, Available as box <chr>,
#> #   RxNorm has ing <chr>, Has tradename <chr>, Has brand name <chr>,
#> #   Has quantified form <chr>, RxNorm inverse is a <chr>, Constitutes <chr>,
#> #   RxNorm - VAProd eq <chr>, RxNorm - SPL <chr>, Quantified form of <chr>,
#> #   Contained in <chr>, Consists of <chr>, Drug has drug class <chr>,
#> #   RxNorm - ATC <chr>, RxNorm - SNOMED eq <chr>

The resulting dataset contains a row for each original concept provided with each relationship and concept/s it has the relationship to in a pipe-separated string.