Last updated on: 2021-01-04
conn <- chariot::connectAthena()
This vignette takes a look at the Value Sets in the mCode Data Dictionary to create the standard library of Oncology concepts, available for loading in this package.
The mCode valuesets are retrieved from the publicly available data dictionary.
value_sets <- get_value_sets()
value_sets
## # A tibble: 6,603 x 5
## `Value Set Name` `Code System` `Logical Definitio… Code `Code Description`
## <chr> <chr> <chr> <chr> <chr>
## 1 CancerBodyLocatio… SNOMED CT includes codes des… <NA> <NA>
## 2 CancerDiseaseStat… SNOMED CT <NA> 3636… Imaging (procedur…
## 3 CancerDiseaseStat… SNOMED CT <NA> 2524… Histopathology te…
## 4 CancerDiseaseStat… SNOMED CT <NA> 7110… Assessment of sym…
## 5 CancerDiseaseStat… SNOMED CT <NA> 5880… Physical examinat…
## 6 CancerDiseaseStat… SNOMED CT <NA> 2507… Tumor marker meas…
## 7 CancerDiseaseStat… SNOMED CT <NA> 3863… Laboratory data i…
## 8 CancerDisorderVS ICD-10-CM <NA> C000 Malignant neoplas…
## 9 CancerDisorderVS ICD-10-CM <NA> C001 Malignant neoplas…
## 10 CancerDisorderVS ICD-10-CM <NA> C002 Malignant neoplas…
## # … with 6,593 more rows
To map the valuesets to OMOP concepts, it is split by Code System
, the equivalent to the vocabulary_id
field in OMOP’s Concept table, for joining. As of 2021-01-04, there are 9 Code Systems
used in mCode.
value_sets_by_vocab <- value_sets %>% rubix::split_by(col = `Code System`)
names(value_sets_by_vocab)
## [1] "ClinVar"
## [2] "http://cancerstaging.org"
## [3] "http://terminology.hl7.org/CodeSystem/v2-0487"
## [4] "http://unitsofmeasure.org"
## [5] "http://varnomen.hgvs.org"
## [6] "http://www.genenames.org/geneId"
## [7] "ICD-10-CM"
## [8] "LOINC"
## [9] "SNOMED CT"
In mCode, Specimen representation is based on HL7’s Code System while in OMOP, the Specimen domain has its own subset of concepts. Furthermore, mCode’s
Specimen representation maps to other OMOP Concept Ids. The Specimen representation in this package therefore has 2 parts:
Specimen representation is derived from the valuesets.
specimen_library <- value_sets_by_vocab$`http://terminology.hl7.org/CodeSystem/v2-0487` %>%
rubix::format_colnames()
head(specimen_library)
## # A tibble: 6 x 5
## value_set_name code_system logical_definit… code code_description
## <chr> <chr> <chr> <chr> <chr>
## 1 GeneticSpecimen… http://terminology.h… <NA> AMN Amniotic fluid
## 2 GeneticSpecimen… http://terminology.h… <NA> BIFL Bile Fluid
## 3 GeneticSpecimen… http://terminology.h… <NA> BLD Whole blood
## 4 GeneticSpecimen… http://terminology.h… <NA> BLDA Blood arterial
## 5 GeneticSpecimen… http://terminology.h… <NA> BLDCO Cord blood
## 6 GeneticSpecimen… http://terminology.h… <NA> BLDV Blood venous
The specimen description is joined on the Concept Synonym Names in the Concept Synonym table to retrieve the Concept Id the mCode Specimen maps to.
specimen_omop_library1 <- join_on_concept_synonym_name(data = specimen_library,
column = "code_description", case_insensitive = TRUE, conn = conn) %>%
select(-concept_synonym_name, -language_concept_id) %>% rename(match_concept_id = concept_id)
head(specimen_omop_library1)
## value_set_name code_system
## 1 GeneticSpecimenTypeVS http://terminology.hl7.org/CodeSystem/v2-0487
## 2 GeneticSpecimenTypeVS http://terminology.hl7.org/CodeSystem/v2-0487
## 3 GeneticSpecimenTypeVS http://terminology.hl7.org/CodeSystem/v2-0487
## 4 GeneticSpecimenTypeVS http://terminology.hl7.org/CodeSystem/v2-0487
## 5 GeneticSpecimenTypeVS http://terminology.hl7.org/CodeSystem/v2-0487
## 6 GeneticSpecimenTypeVS http://terminology.hl7.org/CodeSystem/v2-0487
## logical_definition code code_description match_concept_id
## 1 <NA> SKN Skin 1027716
## 2 <NA> PLC Placenta 1032268
## 3 <NA> SAL Saliva 1033195
## 4 <NA> SPT Sputum 1033465
## 5 <NA> WND Wound 1033739
## 6 <NA> BON Bone 1585831
The complete Concept representation is taken from the Concept Id.
specimen_omop_library2 <- join_on_concept_id(data = specimen_omop_library1,
column = "match_concept_id", conn = conn)
head(specimen_omop_library2)
## value_set_name code_system
## 1 GeneticSpecimenTypeVS http://terminology.hl7.org/CodeSystem/v2-0487
## 2 GeneticSpecimenTypeVS http://terminology.hl7.org/CodeSystem/v2-0487
## 3 GeneticSpecimenTypeVS http://terminology.hl7.org/CodeSystem/v2-0487
## 4 GeneticSpecimenTypeVS http://terminology.hl7.org/CodeSystem/v2-0487
## 5 GeneticSpecimenTypeVS http://terminology.hl7.org/CodeSystem/v2-0487
## 6 GeneticSpecimenTypeVS http://terminology.hl7.org/CodeSystem/v2-0487
## logical_definition code code_description match_concept_id concept_id
## 1 <NA> SKN Skin 1027716 1027716
## 2 <NA> PLC Placenta 1032268 1032268
## 3 <NA> SAL Saliva 1033195 1033195
## 4 <NA> SPT Sputum 1033465 1033465
## 5 <NA> WND Wound 1033739 1033739
## 6 <NA> BON Bone 1585831 1585831
## concept_name domain_id vocabulary_id concept_class_id
## 1 Skin Observation LOINC LOINC System
## 2 Placenta Observation LOINC LOINC System
## 3 Saliva Observation LOINC LOINC System
## 4 Sputum Observation LOINC LOINC System
## 5 Wound Observation LOINC LOINC System
## 6 Organ Transplant Description: Bone Observation PPI Answer
## standard_concept concept_code valid_start_date
## 1 <NA> LP36760-4 1970-01-01
## 2 <NA> LP7477-5 1970-01-01
## 3 <NA> LP7565-7 1970-01-01
## 4 <NA> LP7600-2 1970-01-01
## 5 <NA> LP7726-5 1970-01-01
## 6 <NA> OrganTransplantDescription_Bone 2017-05-17
## valid_end_date invalid_reason
## 1 2099-12-31 <NA>
## 2 2099-12-31 <NA>
## 3 2099-12-31 <NA>
## 4 2099-12-31 <NA>
## 5 2099-12-31 <NA>
## 6 2099-12-31 <NA>
To maintain a one-to-one representation of the original grain of information, the OMOP mappings are pivoted on the OMOP Domain to see how each mCode Specimen maps to by domain.
specimen_omop_library3 <- specimen_omop_library2 %>% merge_strip(into = "concept",
domain_id) %>% pivot_wider(id_cols = !concept, names_from = domain_id,
values_from = concept) %>% select(-match_concept_id) %>%
distinct()
head(specimen_omop_library3)
## # A tibble: 6 x 14
## concept_id value_set_name code_system logical_definit… code code_description
## <dbl> <chr> <chr> <chr> <chr> <chr>
## 1 1027716 GeneticSpecim… http://ter… <NA> SKN Skin
## 2 1032268 GeneticSpecim… http://ter… <NA> PLC Placenta
## 3 1033195 GeneticSpecim… http://ter… <NA> SAL Saliva
## 4 1033465 GeneticSpecim… http://ter… <NA> SPT Sputum
## 5 1033739 GeneticSpecim… http://ter… <NA> WND Wound
## 6 1585831 GeneticSpecim… http://ter… <NA> BON Bone
## # … with 8 more variables: Observation <chr>, Condition <chr>, `Spec Anatomic
## # Site` <chr>, Drug <chr>, `Meas Value` <chr>, Measurement <chr>,
## # Specimen <chr>, `NA` <chr>
To gather a complete representation of Specimens, any missing Specimen concepts in OMOP are added to the library.
omop_specimen <- chariot::queryAthena("SELECT *
FROM omop_vocabulary.concept
WHERE domain_id = 'Specimen' AND concept_class_id = 'Specimen';",
conn = conn) %>% merge_strip(into = "Specimen") %>% select(-Specimen_id)
head(omop_specimen)
## # A tibble: 6 x 1
## Specimen
## <chr>
## 1 [V] [S] 759813 Physical object specimen [SNOMED 1021000124109] [Specimen] [Sp…
## 2 [V] [S] 759820 Paper specimen [SNOMED 1031000124107] [Specimen] [Specimen]
## 3 [V] [S] 759825 Writing paper specimen [SNOMED 1041000124102] [Specimen] [Spec…
## 4 [V] [S] 759826 Envelope specimen [SNOMED 1051000124100] [Specimen] [Specimen]
## 5 [V] [S] 759829 Package specimen [SNOMED 1061000124103] [Specimen] [Specimen]
## 6 [V] [S] 759869 Clothing specimen [SNOMED 1071000124105] [Specimen] [Specimen]
specimen_omop_library <- specimen_omop_library3 %>% full_join(omop_specimen,
by = "Specimen") %>% distinct()
head(specimen_omop_library)
## # A tibble: 6 x 14
## concept_id value_set_name code_system logical_definit… code code_description
## <dbl> <chr> <chr> <chr> <chr> <chr>
## 1 1027716 GeneticSpecim… http://ter… <NA> SKN Skin
## 2 1032268 GeneticSpecim… http://ter… <NA> PLC Placenta
## 3 1033195 GeneticSpecim… http://ter… <NA> SAL Saliva
## 4 1033465 GeneticSpecim… http://ter… <NA> SPT Sputum
## 5 1033739 GeneticSpecim… http://ter… <NA> WND Wound
## 6 1585831 GeneticSpecim… http://ter… <NA> BON Bone
## # … with 8 more variables: Observation <chr>, Condition <chr>, `Spec Anatomic
## # Site` <chr>, Drug <chr>, `Meas Value` <chr>, Measurement <chr>,
## # Specimen <chr>, `NA` <chr>
This file is written to the data-raw/
folder for distribution if it does not already exist.
file <- file.path(getwd(), "data-raw", "specimen.csv")
if (!file.exists(file)) {
write_csv(x = specimen_omop_library, file = file)
}
## Error in open.connection(file, "wb"): cannot open the connection
mCode uses the AJCC TNM Staging system, which correlates with NCIt concepts in the OMOP vocabulary.
cancer_staging_library <- value_sets_by_vocab$`http://cancerstaging.org` %>%
rubix::format_colnames()
head(cancer_staging_library)
## # A tibble: 4 x 5
## value_set_name code_system logical_definition code code_description
## <chr> <chr> <chr> <chr> <chr>
## 1 TNMDistantMetast… http://cancer… includes codes from c… <NA> <NA>
## 2 TNMPrimaryTumorC… http://cancer… includes codes from c… <NA> <NA>
## 3 TNMRegionalNodes… http://cancer… includes codes from c… <NA> <NA>
## 4 TNMStageGroupVS http://cancer… includes codes from c… <NA> <NA>
As it is represented in the OMOP Vocabulary, NCIt is not separated by Tumor, Node, and Metastasis like it is in mCode.
ncit_omop_library <- chariot::queryAthena("SELECT *
FROM omop_vocabulary.concept
WHERE vocabulary_id = 'NCIt' AND concept_class_id = 'AJCC Category';",
conn = conn)
head(ncit_omop_library)
## # A tibble: 6 x 10
## concept_id concept_name domain_id vocabulary_id concept_class_id
## <dbl> <chr> <chr> <chr> <chr>
## 1 1537692 Vulvar Canc… Measurem… NCIt AJCC Category
## 2 1537693 Retinoblast… Measurem… NCIt AJCC Category
## 3 1537694 Vulvar Canc… Measurem… NCIt AJCC Category
## 4 1537695 Nasopharyng… Measurem… NCIt AJCC Category
## 5 1537700 Adrenal Cor… Measurem… NCIt AJCC Category
## 6 1537780 Retinoblast… Measurem… NCIt AJCC Category
## # … with 5 more variables: standard_concept <chr>, concept_code <chr>,
## # valid_start_date <date>, valid_end_date <date>, invalid_reason <chr>
NCIt AJCC Category
concepts are therefore grouped based on pattern matching with the Concept Code.
ncit_omop_library2 <- ncit_omop_library
ncit_omop_library2$value_set_name <- ""
ncit_omop_library2 <- ncit_omop_library2 %>% mutate(value_set_name = case_when(grepl("^[cp]{1}M",
concept_code) ~ "TNMDistantMetastasesCategoryVS", grepl("^[cp]{1}N",
concept_code) ~ "TNMRegionalNodesCategoryVS", grepl("^[cp]{1}T",
concept_code) ~ "TNMPrimaryTumorCategoryVS"))
head(ncit_omop_library2)
## # A tibble: 6 x 11
## concept_id concept_name domain_id vocabulary_id concept_class_id
## <dbl> <chr> <chr> <chr> <chr>
## 1 1537692 Vulvar Canc… Measurem… NCIt AJCC Category
## 2 1537693 Retinoblast… Measurem… NCIt AJCC Category
## 3 1537694 Vulvar Canc… Measurem… NCIt AJCC Category
## 4 1537695 Nasopharyng… Measurem… NCIt AJCC Category
## 5 1537700 Adrenal Cor… Measurem… NCIt AJCC Category
## 6 1537780 Retinoblast… Measurem… NCIt AJCC Category
## # … with 6 more variables: standard_concept <chr>, concept_code <chr>,
## # valid_start_date <date>, valid_end_date <date>, invalid_reason <chr>,
## # value_set_name <chr>
A correlate to mCode’s TNMStageGroupVS
value set is not present in the OMOP Vocabularies, but since it can be derived from the TNM categories, it is skipped.
cancer_staging_omop_library <- cancer_staging_library %>% left_join(ncit_omop_library2,
by = "value_set_name")
head(cancer_staging_omop_library)
## # A tibble: 6 x 15
## value_set_name code_system logical_definit… code code_description concept_id
## <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 TNMDistantMet… http://can… includes codes … <NA> <NA> 1537692
## 2 TNMDistantMet… http://can… includes codes … <NA> <NA> 1537780
## 3 TNMDistantMet… http://can… includes codes … <NA> <NA> 1537798
## 4 TNMDistantMet… http://can… includes codes … <NA> <NA> 1537804
## 5 TNMDistantMet… http://can… includes codes … <NA> <NA> 1537805
## 6 TNMDistantMet… http://can… includes codes … <NA> <NA> 1537808
## # … with 9 more variables: concept_name <chr>, domain_id <chr>,
## # vocabulary_id <chr>, concept_class_id <chr>, standard_concept <chr>,
## # concept_code <chr>, valid_start_date <date>, valid_end_date <date>,
## # invalid_reason <chr>
This data is written to a cancer_staging.csv
in the data-raw/
folder for distribution if it does not already exist.
file <- file.path(getwd(), "data-raw", "cancer_staging.csv")
if (!file.exists(file)) {
write_csv(x = cancer_staging_omop_library, file = file)
}
## Error in open.connection(file, "wb"): cannot open the connection
HGVS, HGNC, and ClinVar are collapsed into a single Genomics
category.
mCode uses HGVS (http://varnomen.hgvs.org), but these codes are not in the OMOP Vocabulary and cannot be derived from the website. Therefore, it is skipped for now.
hgvs_library <- value_sets_by_vocab$`http://varnomen.hgvs.org` %>%
rubix::format_colnames()
head(hgvs_library)
## # A tibble: 1 x 5
## value_set_name code_system logical_definition code code_description
## <chr> <chr> <chr> <chr> <chr>
## 1 HGVSVS http://varnomen… All codes in http://va… <NA> <NA>
Gene Names in mCode are presumed to be derived from HGNC and the entire HGNC subset of the OMOP Vocabularies are included in the library.
hgnc_library <- value_sets_by_vocab$`http://www.genenames.org/geneId` %>%
rubix::format_colnames()
head(hgnc_library)
## # A tibble: 1 x 5
## value_set_name code_system logical_definition code code_description
## <chr> <chr> <chr> <chr> <chr>
## 1 HGNCVS http://www.gene… All codes in http://ww… <NA> <NA>
hgnc_omop_library1 <- chariot::queryAthena("SELECT *
FROM omop_vocabulary.concept
WHERE vocabulary_id = 'HGNC';",
conn = conn)
head(hgnc_omop_library1)
## # A tibble: 6 x 10
## concept_id concept_name domain_id vocabulary_id concept_class_id
## <dbl> <chr> <chr> <chr> <chr>
## 1 35944910 CCDC77 (coi… Measurem… HGNC Gene
## 2 35944911 INMT (indol… Measurem… HGNC Gene
## 3 35944912 ZNF117 (zin… Measurem… HGNC Gene
## 4 35944913 CKAP2 (cyto… Measurem… HGNC Gene
## 5 35944914 ITPRIPL2 (I… Measurem… HGNC Gene
## 6 35944916 PEX19 (pero… Measurem… HGNC Gene
## # … with 5 more variables: standard_concept <chr>, concept_code <chr>,
## # valid_start_date <date>, valid_end_date <date>, invalid_reason <chr>
The OMOP HGNC concepts are joined to the mCode data dictionary set.
hgnc_omop_library2 <- hgnc_omop_library1 %>% mutate(value_set_name = "HGNCVS")
head(hgnc_omop_library2)
## # A tibble: 6 x 11
## concept_id concept_name domain_id vocabulary_id concept_class_id
## <dbl> <chr> <chr> <chr> <chr>
## 1 35944910 CCDC77 (coi… Measurem… HGNC Gene
## 2 35944911 INMT (indol… Measurem… HGNC Gene
## 3 35944912 ZNF117 (zin… Measurem… HGNC Gene
## 4 35944913 CKAP2 (cyto… Measurem… HGNC Gene
## 5 35944914 ITPRIPL2 (I… Measurem… HGNC Gene
## 6 35944916 PEX19 (pero… Measurem… HGNC Gene
## # … with 6 more variables: standard_concept <chr>, concept_code <chr>,
## # valid_start_date <date>, valid_end_date <date>, invalid_reason <chr>,
## # value_set_name <chr>
hgnc_omop_library <- hgnc_library %>% left_join(hgnc_omop_library2,
by = "value_set_name")
head(hgnc_omop_library)
## # A tibble: 6 x 15
## value_set_name code_system logical_definit… code code_description concept_id
## <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 HGNCVS http://www… All codes in ht… <NA> <NA> 35944910
## 2 HGNCVS http://www… All codes in ht… <NA> <NA> 35944911
## 3 HGNCVS http://www… All codes in ht… <NA> <NA> 35944912
## 4 HGNCVS http://www… All codes in ht… <NA> <NA> 35944913
## 5 HGNCVS http://www… All codes in ht… <NA> <NA> 35944914
## 6 HGNCVS http://www… All codes in ht… <NA> <NA> 35944916
## # … with 9 more variables: concept_name <chr>, domain_id <chr>,
## # vocabulary_id <chr>, concept_class_id <chr>, standard_concept <chr>,
## # concept_code <chr>, valid_start_date <date>, valid_end_date <date>,
## # invalid_reason <chr>
Like HGNC, the entire ClinVar subset of the OMOP Vocabularies are included in the library.
clinvar_library <- value_sets_by_vocab$ClinVar %>% rubix::format_colnames()
head(clinvar_library)
## # A tibble: 1 x 5
## value_set_name code_system logical_definition code code_description
## <chr> <chr> <chr> <chr> <chr>
## 1 ClinVarVS ClinVar Includes codes from ClinVar <NA> <NA>
clinvar_omop_library1 <- chariot::queryAthena("SELECT *
FROM omop_vocabulary.concept
WHERE vocabulary_id = 'ClinVar';",
conn = conn)
head(clinvar_omop_library1)
## # A tibble: 6 x 10
## concept_id concept_name domain_id vocabulary_id concept_class_id
## <dbl> <chr> <chr> <chr> <chr>
## 1 35968119 NC_000002.1… Measurem… ClinVar Variant
## 2 35968121 NM_000038.6… Measurem… ClinVar Variant
## 3 35968122 NM_000038.6… Measurem… ClinVar Variant
## 4 35968123 NM_000038.6… Measurem… ClinVar Variant
## 5 35968124 NM_000038.6… Measurem… ClinVar Variant
## 6 35968125 NM_000038.6… Measurem… ClinVar Variant
## # … with 5 more variables: standard_concept <chr>, concept_code <chr>,
## # valid_start_date <date>, valid_end_date <date>, invalid_reason <chr>
clinvar_omop_library2 <- clinvar_omop_library1 %>% mutate(value_set_name = "ClinVarVS")
head(clinvar_omop_library2)
## # A tibble: 6 x 11
## concept_id concept_name domain_id vocabulary_id concept_class_id
## <dbl> <chr> <chr> <chr> <chr>
## 1 35968119 NC_000002.1… Measurem… ClinVar Variant
## 2 35968121 NM_000038.6… Measurem… ClinVar Variant
## 3 35968122 NM_000038.6… Measurem… ClinVar Variant
## 4 35968123 NM_000038.6… Measurem… ClinVar Variant
## 5 35968124 NM_000038.6… Measurem… ClinVar Variant
## 6 35968125 NM_000038.6… Measurem… ClinVar Variant
## # … with 6 more variables: standard_concept <chr>, concept_code <chr>,
## # valid_start_date <date>, valid_end_date <date>, invalid_reason <chr>,
## # value_set_name <chr>
clinvar_omop_library <- clinvar_library %>% left_join(clinvar_omop_library2,
by = "value_set_name")
head(clinvar_omop_library)
## # A tibble: 6 x 15
## value_set_name code_system logical_definit… code code_description concept_id
## <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 ClinVarVS ClinVar Includes codes … <NA> <NA> 35968119
## 2 ClinVarVS ClinVar Includes codes … <NA> <NA> 35968121
## 3 ClinVarVS ClinVar Includes codes … <NA> <NA> 35968122
## 4 ClinVarVS ClinVar Includes codes … <NA> <NA> 35968123
## 5 ClinVarVS ClinVar Includes codes … <NA> <NA> 35968124
## 6 ClinVarVS ClinVar Includes codes … <NA> <NA> 35968125
## # … with 9 more variables: concept_name <chr>, domain_id <chr>,
## # vocabulary_id <chr>, concept_class_id <chr>, standard_concept <chr>,
## # concept_code <chr>, valid_start_date <date>, valid_end_date <date>,
## # invalid_reason <chr>
The final genomics library file is written if it does not already exist.
genomics_omop_library <- bind_rows(hgnc_omop_library, clinvar_omop_library)
file <- file.path(getwd(), "data-raw", "genomics.csv")
if (!file.exists(file)) {
write_csv(x = genomics_omop_library, file = file)
}
## Error in open.connection(file, "wb"): cannot open the connection
Like Specimen, the Units of Measure representation is limited and requires incorporation of the OMOP UCUM
vocabulary valueset.
uom_library <- value_sets_by_vocab$`http://unitsofmeasure.org` %>%
rubix::format_colnames()
head(uom_library)
## # A tibble: 6 x 5
## value_set_name code_system logical_definiti… code code_description
## <chr> <chr> <chr> <chr> <chr>
## 1 UnitsOfLengthVS http://unitsofmeasur… <NA> pm Picometer
## 2 UnitsOfLengthVS http://unitsofmeasur… <NA> nm Nanometer
## 3 UnitsOfLengthVS http://unitsofmeasur… <NA> mm Millimeter
## 4 UnitsOfLengthVS http://unitsofmeasur… <NA> cm Centimeter
## 5 UnitsOfLengthVS http://unitsofmeasur… <NA> m Meter
## 6 UnitsOfLengthVS http://unitsofmeasur… <NA> ft-us Foot
ucum_omop_library1 <- chariot::queryAthena("SELECT *
FROM omop_vocabulary.concept
WHERE vocabulary_id = 'UCUM';",
conn = conn)
head(ucum_omop_library1)
## # A tibble: 6 x 10
## concept_id concept_name domain_id vocabulary_id concept_class_id
## <dbl> <chr> <chr> <chr> <chr>
## 1 8478 avidity ind… Unit UCUM Unit
## 2 8479 centipoise Unit UCUM Unit
## 3 8480 Ehrlich unit Unit UCUM Unit
## 4 8481 EV Unit UCUM Unit
## 5 8482 pH Unit UCUM Unit
## 6 8483 counts per … Unit UCUM Unit
## # … with 5 more variables: standard_concept <chr>, concept_code <chr>,
## # valid_start_date <date>, valid_end_date <date>, invalid_reason <chr>
uom_omop_library <- uom_library %>% bind_rows(ucum_omop_library1)
head(uom_omop_library)
## # A tibble: 6 x 15
## value_set_name code_system logical_definit… code code_description concept_id
## <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 UnitsOfLength… http://uni… <NA> pm Picometer NA
## 2 UnitsOfLength… http://uni… <NA> nm Nanometer NA
## 3 UnitsOfLength… http://uni… <NA> mm Millimeter NA
## 4 UnitsOfLength… http://uni… <NA> cm Centimeter NA
## 5 UnitsOfLength… http://uni… <NA> m Meter NA
## 6 UnitsOfLength… http://uni… <NA> ft-us Foot NA
## # … with 9 more variables: concept_name <chr>, domain_id <chr>,
## # vocabulary_id <chr>, concept_class_id <chr>, standard_concept <chr>,
## # concept_code <chr>, valid_start_date <date>, valid_end_date <date>,
## # invalid_reason <chr>
Write the data to a file if it does not already exist.
file <- file.path(getwd(), "data-raw", "unitsofmeasurement.csv")
if (!file.exists(file)) {
write_csv(x = uom_omop_library, file = file)
}
## Error in open.connection(file, "wb"): cannot open the connection
The SNOMED subset of the library contains:
To derive all concepts, the code in the logical_definition
field is extracted based on the listed scenario above.
snomed_library <- value_sets_by_vocab$`SNOMED CT` %>% rubix::format_colnames() %>%
distinct() %>% mutate_all(as.character) %>% extract(col = logical_definition,
into = "descendants_of", regex = "includes codes descending from ([0-9]{1,})[^0-9]{1}.*$",
remove = FALSE) %>% extract(col = logical_definition, into = "ancestors_of",
regex = "excludes codes descending from ([0-9]{1,})[^0-9]{1}.*$",
remove = FALSE) %>% mutate(all_codes = coalesce(code, ancestors_of,
descendants_of)) %>% mutate_all(trimws)
head(snomed_library)
## # A tibble: 6 x 8
## value_set_name code_system logical_definit… ancestors_of descendants_of code
## <chr> <chr> <chr> <chr> <chr> <chr>
## 1 CancerBodyLoc… SNOMED CT includes codes … <NA> 123037004 <NA>
## 2 CancerDisease… SNOMED CT <NA> <NA> <NA> 3636…
## 3 CancerDisease… SNOMED CT <NA> <NA> <NA> 2524…
## 4 CancerDisease… SNOMED CT <NA> <NA> <NA> 7110…
## 5 CancerDisease… SNOMED CT <NA> <NA> <NA> 5880…
## 6 CancerDisease… SNOMED CT <NA> <NA> <NA> 2507…
## # … with 2 more variables: code_description <chr>, all_codes <chr>
OMOP Concept table is joined to the mCode’s SNOMED library by code.
snomed_omop_library1 <- chariot::join_on_concept_code(kind = "LEFT",
data = snomed_library, column = "all_codes", where_in_concept_field = "vocabulary_id",
where_in_concept_field_value = "SNOMED")
snomed_omop_library1 <- snomed_library %>% left_join(snomed_omop_library1,
by = c("value_set_name", "ancestors_of", "descendants_of",
"all_codes", "code_system", "logical_definition", "code",
"code_description")) %>% distinct()
head(snomed_omop_library1)
## # A tibble: 6 x 18
## value_set_name code_system logical_definit… ancestors_of descendants_of code
## <chr> <chr> <chr> <chr> <chr> <chr>
## 1 CancerBodyLoc… SNOMED CT includes codes … <NA> 123037004 <NA>
## 2 CancerDisease… SNOMED CT <NA> <NA> <NA> 3636…
## 3 CancerDisease… SNOMED CT <NA> <NA> <NA> 2524…
## 4 CancerDisease… SNOMED CT <NA> <NA> <NA> 7110…
## 5 CancerDisease… SNOMED CT <NA> <NA> <NA> 5880…
## 6 CancerDisease… SNOMED CT <NA> <NA> <NA> 2507…
## # … with 12 more variables: code_description <chr>, all_codes <chr>,
## # concept_id <dbl>, concept_name <chr>, domain_id <chr>, vocabulary_id <chr>,
## # concept_class_id <chr>, standard_concept <chr>, concept_code <chr>,
## # valid_start_date <date>, valid_end_date <date>, invalid_reason <chr>
The resultset is then split into 3 based on whether the descendants (a) or ancestors (b) need to be derived, or if the concept is explicitly stated by code (c).
The descendants are derived for those mCode concepts that included descendants.
snomed_omop_library2a <- snomed_omop_library1 %>% filter(!is.na(descendants_of))
snomed_omop_library2a2 <- join_for_descendants(kind = "LEFT",
data = snomed_omop_library2a, ancestor_id_column = "concept_id") %>%
select(all_of(colnames(snomed_library)), starts_with("descendant_")) %>%
rename_all(~str_remove_all(., pattern = "^descendant_"))
head(snomed_omop_library2a2)
## value_set_name code_system
## 1 CancerBodyLocationVS SNOMED CT
## 2 CancerBodyLocationVS SNOMED CT
## 3 CancerBodyLocationVS SNOMED CT
## 4 CancerBodyLocationVS SNOMED CT
## 5 CancerBodyLocationVS SNOMED CT
## 6 CancerBodyLocationVS SNOMED CT
## logical_definition ancestors_of
## 1 includes codes descending from 123037004 | Body Structure <NA>
## 2 includes codes descending from 123037004 | Body Structure <NA>
## 3 includes codes descending from 123037004 | Body Structure <NA>
## 4 includes codes descending from 123037004 | Body Structure <NA>
## 5 includes codes descending from 123037004 | Body Structure <NA>
## 6 includes codes descending from 123037004 | Body Structure <NA>
## descendants_of code code_description all_codes concept_id
## 1 123037004 <NA> <NA> 123037004 4048384
## 2 123037004 <NA> <NA> 123037004 4002852
## 3 123037004 <NA> <NA> 123037004 36717763
## 4 123037004 <NA> <NA> 123037004 4230944
## 5 123037004 <NA> <NA> 123037004 42605189
## 6 123037004 <NA> <NA> 123037004 4097829
## concept_name domain_id
## 1 Body structure Spec Anatomic Site
## 2 Buccal embrasure Spec Anatomic Site
## 3 Skin structure of left lower eyelid Spec Anatomic Site
## 4 Adenofibrosis Observation
## 5 Intervertebral foramen of eighteenth thoracic vertebra Spec Anatomic Site
## 6 Transitional cell carcinoma Observation
## vocabulary_id concept_class_id standard_concept concept_code
## 1 SNOMED Body Structure S 123037004
## 2 SNOMED Body Structure S 110326006
## 3 SNOMED Body Structure S 719884003
## 4 SNOMED Morph Abnormality S 89115006
## 5 SNOMED Veterinary Body Structure S 336621000009107
## 6 SNOMED Morph Abnormality S 27090000
## valid_start_date valid_end_date invalid_reason
## 1 1970-01-01 2099-12-31 <NA>
## 2 1970-01-01 2099-12-31 <NA>
## 3 2017-01-31 2099-12-31 <NA>
## 4 1970-01-01 2099-12-31 <NA>
## 5 2014-01-31 2099-12-31 <NA>
## 6 1970-01-01 2099-12-31 <NA>
The ancestors are derived for those mCode concepts that excluded descendants.
snomed_omop_library2b <- snomed_omop_library1 %>% filter(!is.na(ancestors_of))
snomed_omop_library2b2 <- join_for_ancestors(kind = "LEFT", data = snomed_omop_library2b,
descendant_id_column = "concept_id") %>% filter(min_levels_of_separation !=
0) %>% select(all_of(colnames(snomed_library)), starts_with("ancestor_")) %>%
rename_all(~str_remove_all(., pattern = "^ancestor_"))
head(snomed_omop_library2b2)
## value_set_name code_system
## 1 HistologyMorphologyBehaviorVS SNOMED CT
## 2 HistologyMorphologyBehaviorVS SNOMED CT
## 3 HistologyMorphologyBehaviorVS SNOMED CT
## 4 HistologyMorphologyBehaviorVS SNOMED CT
## 5 HistologyMorphologyBehaviorVS SNOMED CT
## 6 HistologyMorphologyBehaviorVS SNOMED CT
## logical_definition
## 1 excludes codes descending from 399983006 | Papillary neoplasm, pancreatobiliary-type, with high grade intraepithelial neoplasia (morphologic abnormality)
## 2 excludes codes descending from 399983006 | Papillary neoplasm, pancreatobiliary-type, with high grade intraepithelial neoplasia (morphologic abnormality)
## 3 excludes codes descending from 399983006 | Papillary neoplasm, pancreatobiliary-type, with high grade intraepithelial neoplasia (morphologic abnormality)
## 4 excludes codes descending from 399983006 | Papillary neoplasm, pancreatobiliary-type, with high grade intraepithelial neoplasia (morphologic abnormality)
## 5 excludes codes descending from 399983006 | Papillary neoplasm, pancreatobiliary-type, with high grade intraepithelial neoplasia (morphologic abnormality)
## 6 excludes codes descending from 399983006 | Papillary neoplasm, pancreatobiliary-type, with high grade intraepithelial neoplasia (morphologic abnormality)
## ancestors_of descendants_of code code_description all_codes concept_id
## 1 399983006 <NA> <NA> <NA> 399983006 4019483
## 2 399983006 <NA> <NA> <NA> 399983006 4030314
## 3 399983006 <NA> <NA> <NA> 399983006 4042149
## 4 399983006 <NA> <NA> <NA> 399983006 4043350
## 5 399983006 <NA> <NA> <NA> 399983006 4048384
## 6 399983006 <NA> <NA> <NA> 399983006 4133294
## concept_name domain_id vocabulary_id
## 1 Adenoma AND/OR adenocarcinoma Observation SNOMED
## 2 Neoplasm Observation SNOMED
## 3 Epithelial neoplasm Observation SNOMED
## 4 Morphologically altered structure Observation SNOMED
## 5 Body structure Spec Anatomic Site SNOMED
## 6 In situ neoplasm Observation SNOMED
## concept_class_id standard_concept concept_code valid_start_date
## 1 Morph Abnormality S 115215004 1970-01-01
## 2 Morph Abnormality S 108369006 1970-01-01
## 3 Morph Abnormality S 118285006 1970-01-01
## 4 Morph Abnormality S 118956008 1970-01-01
## 5 Body Structure S 123037004 1970-01-01
## 6 Morph Abnormality S 127569003 1970-01-01
## valid_end_date invalid_reason
## 1 2099-12-31 <NA>
## 2 2099-12-31 <NA>
## 3 2099-12-31 <NA>
## 4 2099-12-31 <NA>
## 5 2099-12-31 <NA>
## 6 2099-12-31 <NA>
The concepts that are explicitly stated do not require any additional derivation.
snomed_omop_library2c <- snomed_omop_library1 %>% filter(is.na(descendants_of),
is.na(ancestors_of))
head(snomed_omop_library2c)
## # A tibble: 6 x 18
## value_set_name code_system logical_definit… ancestors_of descendants_of code
## <chr> <chr> <chr> <chr> <chr> <chr>
## 1 CancerDisease… SNOMED CT <NA> <NA> <NA> 3636…
## 2 CancerDisease… SNOMED CT <NA> <NA> <NA> 2524…
## 3 CancerDisease… SNOMED CT <NA> <NA> <NA> 7110…
## 4 CancerDisease… SNOMED CT <NA> <NA> <NA> 5880…
## 5 CancerDisease… SNOMED CT <NA> <NA> <NA> 2507…
## 6 CancerDisease… SNOMED CT <NA> <NA> <NA> 3863…
## # … with 12 more variables: code_description <chr>, all_codes <chr>,
## # concept_id <dbl>, concept_name <chr>, domain_id <chr>, vocabulary_id <chr>,
## # concept_class_id <chr>, standard_concept <chr>, concept_code <chr>,
## # valid_start_date <date>, valid_end_date <date>, invalid_reason <chr>
All 3 subsets are recombined before writing to file.
snomed_omop_library <- bind_rows(snomed_omop_library2a2, snomed_omop_library2b2,
snomed_omop_library2c)
head(snomed_omop_library)
## value_set_name code_system
## 1 CancerBodyLocationVS SNOMED CT
## 2 CancerBodyLocationVS SNOMED CT
## 3 CancerBodyLocationVS SNOMED CT
## 4 CancerBodyLocationVS SNOMED CT
## 5 CancerBodyLocationVS SNOMED CT
## 6 CancerBodyLocationVS SNOMED CT
## logical_definition ancestors_of
## 1 includes codes descending from 123037004 | Body Structure <NA>
## 2 includes codes descending from 123037004 | Body Structure <NA>
## 3 includes codes descending from 123037004 | Body Structure <NA>
## 4 includes codes descending from 123037004 | Body Structure <NA>
## 5 includes codes descending from 123037004 | Body Structure <NA>
## 6 includes codes descending from 123037004 | Body Structure <NA>
## descendants_of code code_description all_codes concept_id
## 1 123037004 <NA> <NA> 123037004 4048384
## 2 123037004 <NA> <NA> 123037004 4002852
## 3 123037004 <NA> <NA> 123037004 36717763
## 4 123037004 <NA> <NA> 123037004 4230944
## 5 123037004 <NA> <NA> 123037004 42605189
## 6 123037004 <NA> <NA> 123037004 4097829
## concept_name domain_id
## 1 Body structure Spec Anatomic Site
## 2 Buccal embrasure Spec Anatomic Site
## 3 Skin structure of left lower eyelid Spec Anatomic Site
## 4 Adenofibrosis Observation
## 5 Intervertebral foramen of eighteenth thoracic vertebra Spec Anatomic Site
## 6 Transitional cell carcinoma Observation
## vocabulary_id concept_class_id standard_concept concept_code
## 1 SNOMED Body Structure S 123037004
## 2 SNOMED Body Structure S 110326006
## 3 SNOMED Body Structure S 719884003
## 4 SNOMED Morph Abnormality S 89115006
## 5 SNOMED Veterinary Body Structure S 336621000009107
## 6 SNOMED Morph Abnormality S 27090000
## valid_start_date valid_end_date invalid_reason
## 1 1970-01-01 2099-12-31 <NA>
## 2 1970-01-01 2099-12-31 <NA>
## 3 2017-01-31 2099-12-31 <NA>
## 4 1970-01-01 2099-12-31 <NA>
## 5 2014-01-31 2099-12-31 <NA>
## 6 1970-01-01 2099-12-31 <NA>
file <- file.path(getwd(), "data-raw", "snomed.csv")
if (!file.exists(file)) {
write_csv(x = snomed_omop_library, file = file)
}
## Error in open.connection(file, "wb"): cannot open the connection
loinc_library <- value_sets_by_vocab$LOINC %>% rubix::format_colnames() %>%
mutate_all(as.character) %>% mutate_all(trimws) %>% distinct()
head(loinc_library)
## # A tibble: 6 x 5
## value_set_name code_system logical_definiti… code code_description
## <chr> <chr> <chr> <chr> <chr>
## 1 TumorMarkerTes… LOINC <NA> 1695… 5-Hydroxyindoleacetate [M…
## 2 TumorMarkerTes… LOINC <NA> 3120… 5-Hydroxyindoleacetate [M…
## 3 TumorMarkerTes… LOINC <NA> 1692… 5-Hydroxyindoleacetate [M…
## 4 TumorMarkerTes… LOINC <NA> 1693… 5-Hydroxyindoleacetate [M…
## 5 TumorMarkerTes… LOINC <NA> 1694… 5-Hydroxyindoleacetate [M…
## 6 TumorMarkerTes… LOINC <NA> 7282… 5-Hydroxyindoleacetate [M…
loinc_omop_library <- chariot::join_on_concept_code(kind = "LEFT",
data = loinc_library, column = "code", where_in_concept_field = "vocabulary_id",
where_in_concept_field_value = "LOINC")
loinc_omop_library <- loinc_library %>% left_join(loinc_omop_library,
by = c("value_set_name", "code_system", "logical_definition",
"code", "code_description")) %>% distinct()
head(loinc_omop_library)
## # A tibble: 6 x 15
## value_set_name code_system logical_definit… code code_description concept_id
## <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 TumorMarkerTe… LOINC <NA> 1695… 5-Hydroxyindole… 3005148
## 2 TumorMarkerTe… LOINC <NA> 3120… 5-Hydroxyindole… 3023028
## 3 TumorMarkerTe… LOINC <NA> 1692… 5-Hydroxyindole… 3014670
## 4 TumorMarkerTe… LOINC <NA> 1693… 5-Hydroxyindole… 3021106
## 5 TumorMarkerTe… LOINC <NA> 1694… 5-Hydroxyindole… 3021385
## 6 TumorMarkerTe… LOINC <NA> 7282… 5-Hydroxyindole… 43055680
## # … with 9 more variables: concept_name <chr>, domain_id <chr>,
## # vocabulary_id <chr>, concept_class_id <chr>, standard_concept <chr>,
## # concept_code <chr>, valid_start_date <date>, valid_end_date <date>,
## # invalid_reason <chr>
file <- file.path(getwd(), "data-raw", "loinc.csv")
if (!file.exists(file)) {
write_csv(x = loinc_omop_library, file = file)
}
## Error in open.connection(file, "wb"): cannot open the connection
icd10cm_library <- value_sets_by_vocab$`ICD-10-CM` %>% rubix::format_colnames() %>%
mutate_all(as.character) %>% mutate_all(trimws) %>% mutate(code = str_replace_all(string = code,
pattern = "(^[A-Z]{1}[0-9A-Z]{2})([0-9A-Z]{1}.*$)", replacement = "\\1.\\2")) %>%
distinct()
head(icd10cm_library)
## # A tibble: 6 x 5
## value_set_name code_system logical_definiti… code code_description
## <chr> <chr> <chr> <chr> <chr>
## 1 CancerDisorder… ICD-10-CM <NA> C00.0 Malignant neoplasm of ext…
## 2 CancerDisorder… ICD-10-CM <NA> C00.1 Malignant neoplasm of ext…
## 3 CancerDisorder… ICD-10-CM <NA> C00.2 Malignant neoplasm of ext…
## 4 CancerDisorder… ICD-10-CM <NA> C00.3 Malignant neoplasm of upp…
## 5 CancerDisorder… ICD-10-CM <NA> C00.4 Malignant neoplasm of low…
## 6 CancerDisorder… ICD-10-CM <NA> C00.5 Malignant neoplasm of lip…
icd10cm_omop_library <- chariot::join_on_concept_code(kind = "LEFT",
data = icd10cm_library, column = "code", where_in_concept_field = "vocabulary_id",
where_in_concept_field_value = "ICD10CM")
icd10cm_omop_library <- icd10cm_library %>% left_join(icd10cm_omop_library,
by = c("value_set_name", "code_system", "logical_definition",
"code", "code_description")) %>% distinct()
head(icd10cm_omop_library)
## # A tibble: 6 x 15
## value_set_name code_system logical_definit… code code_description concept_id
## <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 CancerDisorde… ICD-10-CM <NA> C00.0 Malignant neopl… 35206047
## 2 CancerDisorde… ICD-10-CM <NA> C00.1 Malignant neopl… 35206048
## 3 CancerDisorde… ICD-10-CM <NA> C00.2 Malignant neopl… 35206049
## 4 CancerDisorde… ICD-10-CM <NA> C00.3 Malignant neopl… 35206050
## 5 CancerDisorde… ICD-10-CM <NA> C00.4 Malignant neopl… 35206051
## 6 CancerDisorde… ICD-10-CM <NA> C00.5 Malignant neopl… 35206052
## # … with 9 more variables: concept_name <chr>, domain_id <chr>,
## # vocabulary_id <chr>, concept_class_id <chr>, standard_concept <chr>,
## # concept_code <chr>, valid_start_date <date>, valid_end_date <date>,
## # invalid_reason <chr>
file <- file.path(getwd(), "data-raw", "icd10cm.csv")
if (!file.exists(file)) {
write_csv(x = icd10cm_omop_library, file = file)
}
## Error in open.connection(file, "wb"): cannot open the connection
chariot::dcAthena(conn = conn)