Comment on page
Database
PgAdmin can be accessed through
http(s)://<node domain name>/pgadmin
with username admin
and password admin
(by default). To access the test database just create a server with the name MedCo
, the address postgresql
, username postgres
and password postgres
(by default). Note that PgAdmin is not included in the production deployments.
PgAdmin server configuration.
-- structure
ALTER TABLE i2b2demodata_i2b2.observation_fact
ALTER COLUMN instance_num TYPE bigint,
ALTER COLUMN text_search TYPE bigint;
-- settings
ALTER SYSTEM SET maintenance_work_mem TO '32GB';
SELECT pg_reload_conf();
All the following operations are implemented through PL/pgSQL functions present in the MedCo-i2b2 database.
The parameter of the following functions corresponds to the number of times the existing
observation_fact
table should be added to itself. For example with 3 , the number of rows of the table will be multiplied by 4.- Method 1 (double for loop)
SELECT i2b2demodata_i2b2.obs_fact_duplication_method_1(3);
- Method 2 (temporary table)
SELECT i2b2demodata_i2b2.obs_fact_duplication_method_2(3);
The parameter of the following function corresponds to the number of rows the resulting
observation_fact
table will have.SELECT i2b2demodata_i2b2.obs_fact_reduction(2370000000);
The following command builds only the i2b2 indexes needed by MedCo. I2b2 offers by default more of them to enable features not currently supported by MedCo.
SELECT i2b2demodata_i2b2.obs_fact_indexes();
In the
docker-compose.common.yml
of the running profile, add in the postgresql
service the following key, pointing to where the database files are:volumes:
- PATH_TO_DATABASE_FILES/_data:/var/lib/postgresql/data
Depending on the size of the database, you might need to increase several timeouts, notably the following:
I2B2_WAIT_TIME_SECONDS
SERVER_HTTP_WRITE_TIMEOUT_SECONDS
CLIENT_QUERY_TIMEOUT_SECONDS
ALL_TIMEOUTS_SECONDS
- SSH: icsil1-ds-49.epfl.ch ; accounts: root / lca1
- SSH Key of Mickaël and Joao set up (ask one of them for access)
- 20TB storage in
/srv/
, publicly accessible read-only through rsync daemon - 10Gbps link with IC Clusters through interface 10.90.48.141
pg_volume_nodeX_x1212_indexes
Databases of 3 nodes (X=0,1,2
), with approximately 150k patients and 28.5B records total (50k patients and 9.5B records per node). It is a x1212 duplication of the originaltcga_bio
dataset. It also contains the built indexes. The size of each is around 3TB, which maybe can be reduced by running a PostgreSQL FULL VACUUM.pg_volume_node0_XB_indexes
Those databases are reductions of thepg_volume_node0_x1212_indexes
database, withX=4.75,3.17,2.37
billion records. Those numbers were calculated to keep a total number of 28.5B rows with respectively 6, 9 and 12 nodes.
icclusters-deployment-backup-11-07-19
Contains all the deployment profiles and keys used for thepg_volume_nodeX_x1212_indexes
databases.postgresql-deployment
Local deployment of postgresql and pgAdmin, in order to explore or modify the databases.nebulaexp-alldeployments
Contains all the deployment profiles and keys used for thepg_volume_node0_XB_indexes
databases.
duplications-nodeX
Logs of data duplication (x100, x2, x3) and indexes building for thepg_volume_nodeX_x1212_indexes
databases.reductions-node0
Logs of data reduction (to 2.37B records) and indexes building ofpg_volume_node0_x1212_indexes
database.
Using the rsync daemon allows for easier data copy between machines. It is already enabled on the backup server (read-only), but in some cases it can be useful on other machines.
In the file
/etc/default/rsync
set the variable: RSYNC_ENABLE=true
. Create the file /etc/rsyncd.conf
with the following content, adapted to your case:uid = root
gid = root
[disk]
path = /disk/
comment = MedCo data (read-only)
read only = yes
Then start the daemon with
service rsync start
.Example from an IC Cluster machine:
rsync -a -v rsync://10.90.48.141/srv/databases/pg_volume_node0_x1212_indexes /disk/pg_volume