arrow-left

All pages
gitbookPowered by GitBook
1 of 4

Loading...

Loading...

Loading...

Loading...

Synthetic SPO Data

This page will guide you through loading example synthetic data that follows the SPO (Swiss Personalized Oncology) ontology.

hashtag
Pre-Requisite: Download test data

Execute the download script to download the test datasets.

cd ${MEDCO_SETUP_DIR}/test/data
bash download.sh spo_synthetic

hashtag
Load the data into MedCo

A script is available to load in a simple way the data. Example of how to use it with a test-local-3nodes deployment running on your localhost, adapt it to your own use-case:

cd ${MEDCO_SETUP_DIR}/scripts
for NODE_NB in 0 1 2; do bash load-spo-i2b2-data.sh \
  ../test/data/spo-synthetic/node_${NODE_NB} \
  localhost i2b2medcosrv${NODE_NB} \
  medcoconnectorsrv${NODE_NB}; \
done

v0 (Genomic Data)

The v0 loader expects an ontology, with mutation and clinical data in the MAF format. As the ontology data you must use ${MEDCO_SETUP_DIR}/test/data/genomic/tcga_cbio/clinical_data.csv and ${MEDCO_SETUP_DIR}/test/data/genomic/tcga_cbio/mutation_data.csv. For clinical data you can keep using the same two files or a subset of the data (e.g. 8_clinical_data.csv). More information about how to generate sample data files can be found below. After the following script is executed all the data is encrypted and deterministically tagged in compliance with the MedCo data model.

hashtag
How to use

circle-info

Ensure you have before proceeding to the loading.

The following examples show you how to load data into a running MedCo deployment. Adapt accordingly the commands your use-case.

hashtag
Examples

hashtag
Loading the three nodes on the dev-local-3nodes profile

hashtag
Loading one node on a network-test profile

hashtag
Explanation of the command's arguments

hashtag
Test that the loading was successful

To check that it is working you can query for:

-> MedCo Gemomic Ontology -> Gene Name -> BRPF3

For the small dataset 8_xxxx you should obtain 3 matching subjects (one at each site).

v1 (I2B2 Demodata)

The v1 loader expects an already existing i2b2 database (in .csv format) that will be converted in a way that is compliant with the MedCo data model. This involves encrypting and deterministically tagging some of the data.

List of input (‘original’) files:

  • all i2b2metadata files (e.g. i2b2.csv)

downloaded the data
dummy_to_patient.csv
  • patient_dimension.csv

  • visit_dimension.csv

  • concept_dimension.csv

  • modifier_dimension.csv

  • observation_fact.csv

  • table_access.csv

  • hashtag
    How to use

    circle-info

    Ensure you have downloaded the data before proceeding to the loading.

    The following examples show you how to load data into a running MedCo deployment. Adapt accordingly the commands your use-case.

    hashtag
    Examples

    hashtag
    Loading the three nodes on the dev-local-3nodes profile

    hashtag
    Loading one node on a network-test profile

    hashtag
    Explanation of the command's arguments

    hashtag
    Test that the loading was successful

    To check that it is working you can query for:

    -> Diagnoses -> Neoplasm -> Benign neoplasm -> Benign neoplasm of breast

    You should obtain 2 matching subjects.

    export MEDCO_SETUP_DIR=~/medco \
        MEDCO_DEPLOYMENT_PROFILE=dev-local-3nodes
    cd "${MEDCO_SETUP_DIR}/deployments/${MEDCO_DEPLOYMENT_PROFILE}"
    docker-compose -f docker-compose.tools.yml run medco-loader-srv0 v0 \
        --ont_clinical /data/genomic/tcga_cbio/8_clinical_data.csv \
        --sen /data/genomic/sensitive.txt \
        --ont_genomic /data/genomic/tcga_cbio/8_mutation_data.csv \
        --clinical /data/genomic/tcga_cbio/8_clinical_data.csv \
        --genomic /data/genomic/tcga_cbio/8_mutation_data.csv \
        --output /data/
    docker-compose -f docker-compose.tools.yml run medco-loader-srv1 v0 \
        --ont_clinical /data/genomic/tcga_cbio/8_clinical_data.csv \
        --sen /data/genomic/sensitive.txt \
        --ont_genomic /data/genomic/tcga_cbio/8_mutation_data.csv \
        --clinical /data/genomic/tcga_cbio/8_clinical_data.csv \
        --genomic /data/genomic/tcga_cbio/8_mutation_data.csv \
        --output /data/
    docker-compose -f docker-compose.tools.yml run medco-loader-srv2 v0 \
        --ont_clinical /data/genomic/tcga_cbio/8_clinical_data.csv \
        --sen /data/genomic/sensitive.txt \
        --ont_genomic /data/genomic/tcga_cbio/8_mutation_data.csv \
        --clinical /data/genomic/tcga_cbio/8_clinical_data.csv \
        --genomic /data/genomic/tcga_cbio/8_mutation_data.csv \
        --output /data/
    export MEDCO_SETUP_DIR=~/medco \
        MEDCO_DEPLOYMENT_PROFILE=test-network-xxx-node0
    cd "${MEDCO_SETUP_DIR}/deployments/${MEDCO_DEPLOYMENT_PROFILE}"
    docker-compose -f docker-compose.tools.yml run medco-loader v0 \
        --ont_clinical /data/genomic/tcga_cbio/8_clinical_data.csv \
        --sen /data/genomic/sensitive.txt \
        --ont_genomic /data/genomic/tcga_cbio/8_mutation_data.csv \
        --clinical /data/genomic/tcga_cbio/8_clinical_data.csv \
        --genomic /data/genomic/tcga_cbio/8_mutation_data.csv \
        --output /data/
    NAME:
        medco-loader v0 - Load genomic data (e.g. tcga_bio dataset)
    
    USAGE:
        medco-loader v0 [command options] [arguments...]
    
    OPTIONS:
        --group value, -g value               UnLynx group definition file
        --entryPointIdx value, --entry value  Index (relative to the group definition file) of the collective authority server to load the data
        --sensitive value, --sen value        File containing a list of sensitive concepts
        --dbHost value, --dbH value           Database hostname
        --dbPort value, --dbP value           Database port (default: 0)
        --dbName value, --dbN value           Database name
        --dbUser value, --dbU value           Database user
        --dbPassword value, --dbPw value      Database password
        --ont_clinical value, --oc value      Clinical ontology to load
        --ont_genomic value, --og value       Genomic ontology to load
        --clinical value, --cl value          Clinical file to load
        --genomic value, --gen value          Genomic file to load
        --output value, -o value              Output path to the .csv files
    export MEDCO_SETUP_DIR=~/medco \
        MEDCO_DEPLOYMENT_PROFILE=dev-local-3nodes
    cd "${MEDCO_SETUP_DIR}/deployments/${MEDCO_DEPLOYMENT_PROFILE}"
    docker-compose -f docker-compose.tools.yml run medco-loader-srv0 v1 \
        --sen /data/i2b2/sensitive.txt \
        --files /data/i2b2/files.toml
    docker-compose -f docker-compose.tools.yml run medco-loader-srv1 v1 \
        --sen /data/i2b2/sensitive.txt \
        --files /data/i2b2/files.toml
    docker-compose -f docker-compose.tools.yml run medco-loader-srv2 v1 \
        --sen /data/i2b2/sensitive.txt \
        --files /data/i2b2/files.toml
    export MEDCO_SETUP_DIR=~/medco \
        MEDCO_DEPLOYMENT_PROFILE=test-network-xxx-node0
    cd "${MEDCO_SETUP_DIR}/deployments/${MEDCO_DEPLOYMENT_PROFILE}"
    docker-compose -f docker-compose.tools.yml run medco-loader v1 \
        --sen /data/i2b2/sensitive.txt \
        --files /data/i2b2/files.toml
    NAME:
        medco-loader v1 - Convert existing i2b2 data model
    
    USAGE:
        medco-loader v1 [command options] [arguments...]
    
    OPTIONS:
        --group value, -g value               UnLynx group definition file
        --entryPointIdx value, --entry value  Index (relative to the group definition file) of the collective authority server to load the data
        --sensitive value, --sen value        File containing a list of sensitive concepts
        --dbHost value, --dbH value           Database hostname
        --dbPort value, --dbP value           Database port (default: 0)
        --dbName value, --dbN value           Database name
        --dbUser value, --dbU value           Database user
        --dbPassword value, --dbPw value      Database password
        --files value, -f value               Configuration toml with the path of the all the necessary i2b2 files
        --empty, -e                           Empty patient and visit dimension tables (y/n)

    Data Loading

    There are two ways of loading data into MedCo. The first, using the provided loader, allows to encrypt and load the encrypted data into the MedCo database. The second loads directly pre-generated data into the database without encrypting data.

    hashtag
    Load pre-generated data

    Pre-generated cleartext synthetic data following the SPO (Swiss Personalized Oncology) ontology is available, follow those instructions to load them.

    hashtag
    Encrypt and load data with the loader

    The current version of the loader offers two different loading alternatives: (v0) loading of clinical and genomic data based on MAF datasets; and (v1) loading of generic i2b2 data. Currently these two loaders support each one dataset:

    • v0: a genomic dataset (tcga_cbio publicly available in )

    • v1: the .

    Future releases of this software will allow for other arbitrary data sources, given that they follow a specific structure (e.g. BAM format).

    hashtag
    Pre-Requisite: Download test data

    Execute the download script to download the test datasets.

    hashtag
    Dummy Generation

    The provided example data set files come with dummy data pre-generated. Those data are random dummy entries whose purpose is to prevent frequency attacks. In a future release, the generation will be done dynamically by the loader.

    cBioPortalarrow-up-right
    i2b2 demodataarrow-up-right
    cd ${MEDCO_SETUP_DIR}/test/data
    bash download.sh i2b2
    bash download.sh genomic_orig
    bash download.sh genomic_fake
    bash download.sh genomic_small