1 of 3

Loading Data

v0 (Genomic Data)
- Example

The current version offers two different loading alternatives: (v0) loading of clinical and genomic data based on MAF datasets; and (v1) loading of generic i2b2 data. Currently these two loaders support each one dataset:

v0: a genomic dataset (tcga_cbio publicly available in )
v1: the .

Future releases of this software will allow for other arbitrary data sources, given that they follow a specific structure (e.g. BAM format).

Pre-Requisites

Download test data

From the medco-deployment folder, execute the resources/data/download.sh script to download the test datasets.

v0 (Genomic Data)

The v0 loader expects an ontology, with mutation and clinical data in the MAF format. As the ontology data you must use ~/medco-deployment/resources/data/genomic/tcga_cbio/clinical_data.csv and ~/medco-deployment/resources/data/genomic/tcga_cbio/mutation_data.csv. For clinical data you can keep using the same two files or a subset of the data (e.g. 8_clinical_data.csv). More information about how to generate sample datafiles can be found below. After the following script is executed all the data is encrypted and ‘deterministically tagged’ in compliance with the MedCo data model.

Example

The following example allows to load data into a running MedCo development deployment (dev-local-3nodes), on the node 0. Adapt accordingly the docker-compose service being ran to load the two other nodes of this profile.

Explanation of the arguments:

Data Manipulation

Inside ~/medco-loader/data/scripts/ you can find a small python application to extract (or replicate) data out of the original tcga_cbio dataset. You can decide which patients you want to consider for you ‘new’ dataset or simply randomly pick a sample.

To check that it is working you can query for:

-> MedCo Gemomic Ontology -> Gene Name -> BRPF3

For the small dataset 8_xxxx you should obtain 3 matching subjects (one at each site).

v1 (I2B2 Demodata)

The v1 loader expects an already existing i2b2 database (in .csv format) that will be converted in a way that is compliant with the MedCo data model. This involves encrypting and ‘deterministically tagging’ some of the data.

List of input (‘original’) files:

all i2b2metadata files (e.g. i2b2.csv)
dummy_to_patient.csv
patient_dimension.csv
visit_dimension.csv
concept_dimension.csv
modifier_dimension.csv
observation_fact.csv
table_access.csv

Dummy Generation

The provided example data set files come with dummy data pre-generated. Those data are random dummy entries whose purpose is to prevent frequency attacks. For more information on how this dummy generation is done please refer to ~/medco-loader/data/scripts/import-tool/report/report.pdf. In a future release, the generation will be done dynamically by the loader.

Example

Explanation of the arguments:

To check that it is working you can query for:

-> Diagnoses -> Neoplasm -> Benign neoplasm -> Benign neoplasm of breast

You should obtain 2 matching subjects.

Loading Data

v0 (Genomic Data)
- Example

v0: a genomic dataset (tcga_cbio publicly available in )
v1: the .

Future releases of this software will allow for other arbitrary data sources, given that they follow a specific structure (e.g. BAM format).

Pre-Requisites

Download test data

From the medco-deployment folder, execute the resources/data/download.sh script to download the test datasets.

v1 (I2B2 Demodata)

List of input (‘original’) files:

all i2b2metadata files (e.g. i2b2.csv)
dummy_to_patient.csv
patient_dimension.csv
visit_dimension.csv
concept_dimension.csv
modifier_dimension.csv
observation_fact.csv
table_access.csv

Dummy Generation

Example

Explanation of the arguments:

To check that it is working you can query for:

-> Diagnoses -> Neoplasm -> Benign neoplasm -> Benign neoplasm of breast

You should obtain 2 matching subjects.

v0 (Genomic Data)

Example

Explanation of the arguments:

Data Manipulation

To check that it is working you can query for:

-> MedCo Gemomic Ontology -> Gene Name -> BRPF3

For the small dataset 8_xxxx you should obtain 3 matching subjects (one at each site).

Loading Data

hashtagPre-Requisites

v0 (Genomic Data)

hashtagExample

hashtagData Manipulation

v1 (I2B2 Demodata)

hashtagDummy Generation

hashtagExample

Loading Data

hashtagPre-Requisites

v1 (I2B2 Demodata)

hashtagDummy Generation

hashtagExample

v0 (Genomic Data)

hashtagExample

hashtagData Manipulation

Pre-Requisites

Example

Data Manipulation

Dummy Generation

Example

Pre-Requisites

Dummy Generation

Example

Example

Data Manipulation