Data Access
The NPM Phase I Data Access Committee (NPM DAC) has been established to oversee access to the SG10K_Pilot and SG10K_Health datasets to ensure:
- Data is used appropriately according to NPM terms and conditions, including adherence to informed consent forms and ethical approvals for the data in question.
- Data users are qualified investigators embedded within a recognised research-intensive organisation.
Interested applicants to read through the data access policies and data access forms listed below:
- SG10K_Health data access policy
- SG10K_Health data access form
- SG10K_Pilot data access policy
- SG10K_Pilot data access form
Click here to view the list of approved projects:
Datasets
SG10K_Pilot dataset refers to the joint variant calling of 4,180 whole-genome sequencing data deposited on the EGA database. All datasets have been pseudonymised and so considered de-identified as described in the paper. Two files are available for access: 1) the genotype data arranged by chromosomes in VCF format, and 2) a metadata file containing the self-reported ethnicity.
SG10K_Health dataset is a collection of integrated genomic and phenotypic data of 10,000 healthy and consented individuals of Chinese, Malay and Indian ethnicities. The SG10K_Health data is contributed from six  cohorts in Singapore: (1) Multi-Ethnic Cohort (MEC) study, (2) Health for Life in òòò½Íø(HELIOS) study, (3) Growing Up in òòò½ÍøTowards healthy Outcomes (GUSTO) study, (4) TTSH Personalised Medicine Normal Controls (TTSH) study, (5) òòò½ÍøEpidemiology of Eye Diseases (SEED) study and (6) Biobank/SingHEART, SingHealth Duke-NUS Institute of Precision Medicine (PRISM) study.
| S/N | Dataset | Description |
| 1 | SG10K_Health metadata | A metadata file containing the self-reported ethnicity, sex and other research phenotypic variables. |
| 2 | SG10K_Health VCF (r5.3) | Whole genome GATK joint variant calling of 9,770 individuals of Chinese, Indian and Malay ethnicities containing 179,418,971 variants. |
| 3 | SG10K_Health DNA methylation array | Whole genome DNA methylation on Illumina Infinium Methylation EPIC array (850K) |
| 4 | SG10K_Health Structural Variants (r1.4) | 73,035 structural variants derived from 5,487 SG10K_Health participants using Manta, MELT and SurVindel ( ) |
Data Access Platform
- For SG10K_Pilot dataset, approved researchers will be directed to the EGA portal to access the SG10K_Pilot data.
- For SG10K_Health dataset, approved researchers will access the data via the RAPTOR platform. Learn more about the RAPTOR platform here.
A*STAR celebrates International Women's Day

From groundbreaking discoveries to cutting-edge research, our researchers are empowering the next generation of female science, technology, engineering and mathematics (STEM) leaders.