Data Access

The NPM Phase I Data Access Committee (NPM DAC) has been established to oversee access to the SG10K_Pilot and SG10K_Health datasets to ensure:

  • Data is used appropriately according to NPM terms and conditions, including adherence to informed consent forms and ethical approvals for the data in question.
  • Data users are qualified investigators embedded within a recognised research-intensive organisation.

Interested applicants to read through the data access policies and data access forms listed below:

Click here to view the list of approved projects: 

Datasets

SG10K_Pilot dataset refers to the  joint variant calling of 4,180 whole-genome sequencing data deposited on the EGA database. All datasets have been pseudonymised and so considered de-identified as described in the paper. Two files are available for access: 1) the genotype data arranged by chromosomes in VCF format, and 2) a metadata file containing the self-reported ethnicity.  

SG10K_Health dataset is a collection of integrated genomic and phenotypic data of 10,000 healthy and consented individuals of Chinese, Malay and Indian ethnicities. The SG10K_Health data is contributed from six  cohorts in Singapore: (1) Multi-Ethnic Cohort (MEC) study, (2) Health for Life in òòò½Íø(HELIOS) study, (3) Growing Up in òòò½ÍøTowards healthy Outcomes (GUSTO) study, (4) TTSH Personalised Medicine Normal Controls (TTSH) study, (5) òòò½ÍøEpidemiology of Eye Diseases (SEED) study and (6) Biobank/SingHEART, SingHealth Duke-NUS Institute of Precision Medicine (PRISM) study. 

S/NDatasetDescription
1SG10K_Health metadataA metadata file containing the self-reported ethnicity, sex and other research phenotypic variables.
2SG10K_Health VCF (r5.3) Whole genome GATK joint variant calling of 9,770 individuals of Chinese, Indian and Malay ethnicities containing 179,418,971 variants. 
3SG10K_Health DNA methylation array Whole genome DNA methylation on Illumina Infinium Methylation EPIC array (850K) 
4SG10K_Health Structural Variants (r1.4) 73,035 structural variants derived from 5,487 SG10K_Health participants using Manta, MELT and SurVindel ( ) 

 

Data Access Platform 

  • For SG10K_Pilot dataset, approved researchers will be directed to the EGA portal to access the SG10K_Pilot data. 
  • For SG10K_Health dataset, approved researchers will access the data via the RAPTOR platform. Learn more about the RAPTOR platform here.