PgAdmin can be accessed through http(s)://<node domain name>/pgadmin
with username admin
and password admin
(by default). To access the test database just create a server with the name MedCo
, the address postgresql
, username postgres
and password postgres
(by default). Note that PgAdmin is not included in the production deployments.
All the following operations are implemented through PL/pgSQL functions present in the MedCo-i2b2 database.
The parameter of the following functions corresponds to the number of times the existing observation_fact
table should be added to itself. For example with 3 , the number of rows of the table will be multiplied by 4.
Method 1 (double for loop)
Method 2 (temporary table)
The parameter of the following function corresponds to the number of rows the resulting observation_fact
table will have.
The following command builds only the i2b2 indexes needed by MedCo. I2b2 offers by default more of them to enable features not currently supported by MedCo.
In the docker-compose.common.yml
of the running profile, add in the postgresql
service the following key, pointing to where the database files are:
Depending on the size of the database, you might need to increase several timeouts, notably the following:
I2B2_WAIT_TIME_SECONDS
SERVER_HTTP_WRITE_TIMEOUT_SECONDS
CLIENT_QUERY_TIMEOUT_SECONDS
ALL_TIMEOUTS_SECONDS
SSH: icsil1-ds-49.epfl.ch ; accounts: root / lca1
SSH Key of Mickaël and Joao set up (ask one of them for access)
20TB storage in /srv/
, publicly accessible read-only through rsync daemon
10Gbps link with IC Clusters through interface 10.90.48.141
/srv/
/srv/databases
pg_volume_nodeX_x1212_indexes
Databases of 3 nodes (X=0,1,2
), with approximately 150k patients and 28.5B records total (50k patients and 9.5B records per node). It is a x1212 duplication of the original tcga_bio
dataset. It also contains the built indexes. The size of each is around 3TB, which maybe can be reduced by running a PostgreSQL FULL VACUUM.
pg_volume_node0_XB_indexes
Those databases are reductions of the pg_volume_node0_x1212_indexes
database, with X=4.75,3.17,2.37
billion records. Those numbers were calculated to keep a total number of 28.5B rows with respectively 6, 9 and 12 nodes.
/srv/deployments
icclusters-deployment-backup-11-07-19
Contains all the deployment profiles and keys used for the pg_volume_nodeX_x1212_indexes
databases.
postgresql-deployment
Local deployment of postgresql and pgAdmin, in order to explore or modify the databases.
nebulaexp-alldeployments
Contains all the deployment profiles and keys used for the pg_volume_node0_XB_indexes
databases.
/srv/logs
duplications-nodeX
Logs of data duplication (x100, x2, x3) and indexes building for the pg_volume_nodeX_x1212_indexes
databases.
reductions-node0
Logs of data reduction (to 2.37B records) and indexes building of pg_volume_node0_x1212_indexes
database.
Using the rsync daemon allows for easier data copy between machines. It is already enabled on the backup server (read-only), but in some cases it can be useful on other machines.
In the file /etc/default/rsync
set the variable: RSYNC_ENABLE=true
. Create the file /etc/rsyncd.conf
with the following content, adapted to your case:
Then start the daemon with service rsync start
.
Example from an IC Cluster machine: rsync -a -v rsync://10.90.48.141/srv/databases/pg_volume_node0_x1212_indexes /disk/pg_volume