Analysis of the UK Biobank genetic and phenotypic data demonstrate the power of including a large population and detailed phenotyping in a prospective study to identify genetic and lifestyle factors related to health and disease.

Large prospective studies are critical for quantifying the contribution of genetic and environmental factors to the development and progression of human diseases1. UK Biobank is a study of over 500,000 British residents aged 40–69, which is when chronic diseases are most likely to manifest, who were recruited for a baseline examination between 2006 and 2010 (ref. 2). Designed and implemented as an open-access, large-scale research resource for investigators from around the world3, UK Biobank has already released its data to over 2,000 researchers worldwide who are examining a plethora of topics related to genetic and environmental determinants of common diseases.


Credit: Mopic/Alamy Stock Photo



In addition to a wide array of measures of physical function, lifestyle, and health at baseline; biochemical and genetic measures performed on stored samples; repeated assessments of subsets to assess measurement error; and follow-up for health outcomes through health-related record linkage, numerous subsequent enhancements to the dataset have been implemented. These include a multimodal clinical imaging assessment of 100,000 participants, which was begun in 2014 and will take 7 years to complete. A user-friendly data showcase ( and clearly defined, efficient procedures for requesting data access provide an exemplary model for making the resource available to researchers and students throughout the world. Participants provided consent for baseline data collection, storage, and future study of biospecimens; linkage to medical records; and the ability to be recontacted for further data collection.

The October 2018 issue of Nature includes two landmark UK Biobank papers describing the phenotypic and genomic data available in the resource (Bycroft et al.4) and initial genome-wide association studies (GWASs) based on the brain imaging phenotypes available (Elliott et al.5). Bycroft et al.4 not only provide a concise overview of the available phenotypic data but also present a detailed description of the genotyping methodology, content, and quality control that helps readers analyze their data effectively. The authors analyze ancestral diversity (both genetic and self-reported) among the recruited UK residents and identify cryptic relatedness and find that 30% of UK Biobank participants are inferred to have at least one third-degree or closer relative in the cohort. They also estimate haplotypes to help determine, for example, whether two identified variants are present in the same parental copy of a gene or if one in each copy occurs. The authors carry out genotype imputation methods using a combination of reference panels and provide investigators with a dataset of over 96 million single-nucleotide polymorphisms (SNPs), short indels, and large structural variants for future analyses.

Availability of massive datasets such as these have stimulated (if not necessitated) methodologic innovations. Bycroft et al.4 describe the development of new file formats and programs to improve data compression and to facilitate fast multitrait GWASs and ‘phenome-wide’ association studies, in which genotypes are related to a vast array of phenotype characteristics in a search for novel gene–disease associations6. Furthermore, the authors described successful imputation in the most highly polymorphic region of the human genome (and the one with the most gene–disease associations)—the major histocompatibility complex—in British participants of European ancestry and demonstrated the utility of this imputation by reproducing previously reported associations with self-reported immune-mediated diseases. Lastly, to demonstrate the potential of this massive dataset, they carried out a GWAS of height in over 340,000 unrelated European-ancestry participants and compared it to a recent meta-analysis of 250,000 European-ancestry individuals in the Genetic Investigation of Anthropometric Traits (GIANT) Consortium7. They were able to replicate the associations found in GIANT typically at much stronger significance levels and to a higher resolution, which will facilitate identification of genes to be pursued in functional studies and to be considered in making biological inferences.

Elliott et al.5 report a GWAS of roughly 8,400 British individuals of European ancestry from UK Biobank, analyzing complex traits from over 3,000 functional and structural brain-imaging phenotypes that have been generated and disseminated to users of the resource5. It should be noted that specialized phenotypes and other data generated by users of the resource, such as the brain phenotypes identified by the authors here, are returned to the resource for use in future analyses and thereby enhance its value; nearly 80 such datasets have been returned to date. The brain-imaging data were obtained from the initial release of roughly 10,000 participants’ neuroimaging data in February 2017, with replication performed using data released in January 2018 that were obtained from roughly 5,000 additional participants. The brain structural phenotypes analyzed by the authors included volumetric analyses of anatomical features, such as total gray matter volume and hippocampal volume. They used functional magnetic resonance imaging (fMRI) measures to indirectly identify the activity of specific brain regions and connectivity measures to define major white matter tracts or the strength of functional interactions between different cortical regions. Over half of the image-derived phenotypes (IDPs) showed significant heritability as measured by associated SNPs, with volumetric measures being the most heritable among the structural measures, and higher heritability for resting versus task-related fMRI phenotypes. They identified 38 distinct clusters of SNPs associated with similar IDPs. Many of the IDP-associated genes can be linked to mechanisms of brain development and plasticity, and some have previously been related to psychiatric disorders, such as major depression and schizophrenia. These initial insights set the stage for the more complete identification of genetic associations with brain structure and function to be carried out in the future with the full UK Biobank dataset of 100,000 participants.

What do these two studies mean for medical care and research? First, the availability of dense genotyping data and simultaneous collection of phenotyping data not only will enable the identification of large numbers of genetic associations (and their potential biological implications) with a vast array of traits, but also will permit the exploration of differences in these associations in important subgroups of individuals defined by factors such as environmental exposures. Examination of gene–environment interactions was in fact a major motivating factor for UK Biobank’s large size, and careful attention to measurement of environmental and lifestyle factors was made when collecting initial data8. Sociodemographics are another important factor in defining subgroups of individuals; one would hope that these initial analyses limited to persons of genetically inferred British ancestry will soon be extended to all participants regardless of ancestry, as UK Biobank includes over 9,400 persons of self-reported South Asian ancestry and over 7,600 with African ancestry. Though these subsets are dwarfed by the over 430,000 persons of British ancestry, they carry population-specific alleles that could provide valuable insights into disease associations. At a minimum, data from the two largest non-British-ancestry subgroups should be deeply explored and compared to datasets from their ancestral populations for insights into conditions and genetic variants that disproportionately affect them. It is to be hoped that the growing numbers of genetic investigators in South Asia and Africa, as well as other investigators interested in exploring the full range of human genetic diversity, will boldly embrace these UK Biobank data while using its methods to construct robust prospective cohorts of their own.

Second, development of reproducible measures of brain structure and function that can be assessed in this large number of people and distribution of these clinical imaging resources to other investigators raise the hope of identifying imaging markers of serious neuropsychiatric disorders before they have produced signs and symptoms, potentially at a stage where interventions can effectively prevent them. Although (for the most part) such preventive measures are not yet at hand, one may hope that research into effective therapies may come to fruition concurrently with research in UK Biobank and other studies to identify in whom best to apply them.

Finally, we should understand that the power of UK Biobank lies not only in its large sample size and dense phenotypic and genotypic characterization, but also in its success in engaging investigators from all over the world in jointly contributing to the analysis and enhancement of its vast data. Indeed, UK Biobank is exemplary in its data-sharing approach, continually finding new ways to make its data understandable, accessible, and interpretable to students and researchers alike. The ready availability of individual-level data (rather than summary statistics alone as is more common in large studies) makes possibie conditional and multitrait analyses, subgroup analyses, interaction analyses, and a host of other explorations that far exceed the capabilities of any one research group in a lifetime of research. It is by harnessing the power of the global collective research enterprise that UK Biobank will fully realize its enormous potential and fulfill its mandate of providing a premier resource for improving the health of future generations.



원문: 여기를 클릭하세요~



Leave a Reply

Your email address will not be published. Required fields are marked *