MedCo Technical Documentation
GithubDocker HubMedCoLDS
v3.0.0
v3.0.0
  • Home
  • Releases
  • For System Administrators
    • Requirements
    • Deployment
      • Local Test Deployment
      • Network Deployment
      • Configuration
        • Passwords
        • Keycloak
        • Setting Authorizations
        • HTTPS Configuration
        • Configuring SwitchAAI Authentication
    • Data Loading
      • Synthetic SPO Data
      • v0 (Genomic Data)
      • v1 (I2B2 Demodata)
    • Command-Line Interface (CLI)
    • Network Architecture
    • Common Problems
  • For Developers
    • Local Development Deployment
    • System Architecture
    • Description of the default test data
    • Database
    • Live Demo
    • Release a new version
  • For users
    • MedCo Live Demo Tutorial
Powered by GitBook
On this page
  • Administration with PgAdmin
  • Managing Large Databases: Data Loading
  • Database modifications
  • Data generation
  • MedCo setup
  • Backed Up Data
  • Backup Server
  • Data Stored in /srv/
  • Copying data to/from IC-Cluster machines

Was this helpful?

Export as PDF
  1. For Developers

Database

PreviousDescription of the default test dataNextLive Demo

Last updated 3 years ago

Was this helpful?

Administration with PgAdmin

PgAdmin can be accessed through http(s)://<node domain name>/pgadmin with username admin and password admin (by default). To access the test database just create a server with the name MedCo, the address postgresql, username postgres and password postgres (by default). Note that PgAdmin is not included in the production deployments.

Managing Large Databases: Data Loading

Database modifications

-- structure
ALTER TABLE i2b2demodata_i2b2.observation_fact
    ALTER COLUMN instance_num TYPE bigint,
    ALTER COLUMN text_search TYPE bigint;
    
-- settings
ALTER SYSTEM SET maintenance_work_mem TO '32GB';
SELECT pg_reload_conf();

Data generation

All the following operations are implemented through PL/pgSQL functions present in the MedCo-i2b2 database.

Duplication

The parameter of the following functions corresponds to the number of times the existing observation_fact table should be added to itself. For example with 3 , the number of rows of the table will be multiplied by 4.

  • Method 1 (double for loop)

SELECT i2b2demodata_i2b2.obs_fact_duplication_method_1(3);
  • Method 2 (temporary table)

SELECT i2b2demodata_i2b2.obs_fact_duplication_method_2(3);

Reduction

The parameter of the following function corresponds to the number of rows the resulting observation_fact table will have.

SELECT i2b2demodata_i2b2.obs_fact_reduction(2370000000);

Indexes

The following command builds only the i2b2 indexes needed by MedCo. I2b2 offers by default more of them to enable features not currently supported by MedCo.

SELECT i2b2demodata_i2b2.obs_fact_indexes();

MedCo setup

PostgreSQL database files

In the docker-compose.common.yml of the running profile, add in the postgresql service the following key, pointing to where the database files are:

volumes:
  - PATH_TO_DATABASE_FILES/_data:/var/lib/postgresql/data

Timeouts

Depending on the size of the database, you might need to increase several timeouts, notably the following:

  • I2B2_WAIT_TIME_SECONDS

  • SERVER_HTTP_WRITE_TIMEOUT_SECONDS

  • CLIENT_QUERY_TIMEOUT_SECONDS

  • ALL_TIMEOUTS_SECONDS

Backed Up Data

Backup Server

  • SSH: icsil1-ds-49.epfl.ch ; accounts: root / lca1

  • SSH Key of Mickaël and Joao set up (ask one of them for access)

  • 20TB storage in /srv/, publicly accessible read-only through rsync daemon

  • 10Gbps link with IC Clusters through interface 10.90.48.141

Data Stored in /srv/

/srv/databases

  • pg_volume_nodeX_x1212_indexes Databases of 3 nodes (X=0,1,2), with approximately 150k patients and 28.5B records total (50k patients and 9.5B records per node). It is a x1212 duplication of the original tcga_bio dataset. It also contains the built indexes. The size of each is around 3TB, which maybe can be reduced by running a PostgreSQL FULL VACUUM.

  • pg_volume_node0_XB_indexes Those databases are reductions of the pg_volume_node0_x1212_indexes database, with X=4.75,3.17,2.37 billion records. Those numbers were calculated to keep a total number of 28.5B rows with respectively 6, 9 and 12 nodes.

/srv/deployments

  • icclusters-deployment-backup-11-07-19 Contains all the deployment profiles and keys used for the pg_volume_nodeX_x1212_indexes databases.

  • postgresql-deployment Local deployment of postgresql and pgAdmin, in order to explore or modify the databases.

  • nebulaexp-alldeployments Contains all the deployment profiles and keys used for the pg_volume_node0_XB_indexes databases.

/srv/logs

  • duplications-nodeX Logs of data duplication (x100, x2, x3) and indexes building for the pg_volume_nodeX_x1212_indexes databases.

  • reductions-node0 Logs of data reduction (to 2.37B records) and indexes building of pg_volume_node0_x1212_indexes database.

Copying data to/from IC-Cluster machines

Enabling rsync daemon on a linux machine

Using the rsync daemon allows for easier data copy between machines. It is already enabled on the backup server (read-only), but in some cases it can be useful on other machines.

In the file /etc/default/rsync set the variable: RSYNC_ENABLE=true. Create the file /etc/rsyncd.conf with the following content, adapted to your case:

uid = root
gid = root

[disk]
path = /disk/
comment = MedCo data (read-only)
read only = yes

Then start the daemon with service rsync start.

Copy data with rsync

Example from an IC Cluster machine: rsync -a -v rsync://10.90.48.141/srv/databases/pg_volume_node0_x1212_indexes /disk/pg_volume

PgAdmin server configuration.