Scientists in the U.S. and Korea have developed a statistical method for estimating the broad numbers of DNA variations that impact on different physical traits, such as height or childhood IQ, and on a wide range of diseases, from diabetes to bipolar disorder.
Their findings, derived from analyses of existing genome-wide association studies (GWAS), suggest that any one trait or disease risk may be associated with up to tens of thousands of single nucleotide polymorphisms (SNPs) – many more than was previously thought – each of which has a minute individual effect, but which cumulatively impact significantly on trait variability or disease risk.
“Depending on their sample sizes, previous genome-wide association studies have uncovered a few SNPs or many for any given disease or trait,” says Nilanjan Chatterjee, Ph.D., the Bloomberg distinguished professor in the department of biostatistics at the Bloomberg School of Public Health, Johns Hopkins University, and senior author of the team’s published paper in Nature Genetics. “But what they generally haven’t done is reveal the overall genetic architectures of diseases or traits – in other words, the likely number of SNPs that contribute and the distributions of their effect sizes.”
Using the new method Dr. Chatterjee’s team also predicted the sample sizes needed to identify the majority of SNPs that underpin heritability of the diseases or traits included. Their results suggest that large-scale GWAS aimed at identifying individual risk variants, and deriving polygenic risk scores for estimating individual disease risk, may need to encompass possibly millions of samples for some complex traits.
“In terms of practical results, we can now use this method to estimate, for any trait or disease, the number of individuals we need to sample in future studies to identify the majority of the important genetic contributions,” Dr. Chaterjee states. The team reports its findings in a paper titled, “Estimation of complex effect-size distributions using summary-level statistics from genome-wide association studies across 32 complex traits.”
GWAS designed to identify genetic variation that impacts on complex traits and diseases may include hundreds of thousands of individuals, and have allowed scientists to identify tens or hundreds of common susceptibility SNPs. The results from GWAS also suggest that common genetic variants may have a greater impact on heritability than can be explained by SNPs.
The approach developed by the Bloomberg and Johns Hopkins team, and colleagues at Dongguk University, Seoul, makes it possible to estimate potentially complex effect-size distribution of a trait based on statistics available from GWAS consortia. “We applied this method to analyze publicly available summary-level association statistics for 19 quantitative traits and 13 binary traits, to provide a large and comprehensive analysis of effect-size distributions underlying GWAS.” Traits included body mass index (BMI), height, body measurements, childhood obesity and IQ, cognitive performance and intelligence. The 13 major diseases included ranged from Alzheimer’s disease and asthma to coronary artery disease, rheumatoid arthritis, autism spectrum disorder, and neuropsyciatric disorders including bipolar disorder, major depressive disorder, and schizophrenia.
As well as confirming that many traits may be associated with many thousands of SNPs, the findings indicated that traits and diseases relating to mental health and ability, such as IQ, depression, and schizophrenia, were linked with the largest number of SNPs, reaching tens of thousands. “For the traits we analyzed related to mental health and cognitive ability, there is really a continuum of effect sizes, suggesting a distinct type of genetic architecture,” says Dr. Chatterjee, who has a joint appointment in Johns Hopkins Medicine’s department of oncology. “In general, anthropometric traits, psychiatric diseases, and traits related to intelligence, cognitive ability, and educational attainment were found to be most polygenic, each involving >10,000 underlying susceptibility SNPs,” the authors write.
In contrast, some common disorders such as heart disease and type 2 diabetes are influenced by fewer, albeit still many thousands, of SNPs, some of which have relatively large individual effects. “… some of the early growth traits, autoimmune disorders, and adult-onset chronic common diseases (for example, coronary artery disease, asthma, Alzheimer’s disease, type-2 diabetes) were less polygenic, although each still involved at least a few thousand underlying susceptibility SNPs,” the team notes.
The diversity of genetic architecture identified across the traits also implies “major differences in the future yield of GWAS,” the authors suggest. While the discovery of new SNPS for different traits will increase rapidly, the degree of genetic variance they explain will increase more slowly, because each SNP is likely to have a smaller effect size. In other words, for many traits and diseases, including heart disease and diabetes, the point of diminishing return for GWAS will start once a sample size reaches several hundred thousand. For the more polygenic psychiatric disorders and cognitive traits, the sample size will need to reach into the millions.
“The sample size needed to identify SNPs that can explain 80% of GWAS heritability is approximately 500,000 for some of the early growth traits, one million for adult height, between two and four million for various cholesterol- and obesity-related traits, and as high as six million for childhood intelligence quotient,” the authors suggest. “The sample size needed to identify SNPs that can explain 80% of GWAS heritability is between 200,000 and 400,000 for inflammatory bowel diseases, around 600,000 for rheumatoid arthritis, between 500,000 and one million for most common adult-onset chronic diseases, between 0.7 and 1.5 million for most psychiatric diseases, and up to 10 million for major depressive disorder.”
While large-scale GWAS have great potential to uncover SNPs and gene variants that explain the heritable nature of common complex traits and diseases, the approach may not be realistic for investigating the genetic basis of more uncommon disorders. “… for rare diseases and difficult- or expensive-to-ascertain traits, it is not clear what is realistically achievable …” at least partly because adequate sample sizes may be difficult to reach. Nevertheless, Dr. Chatterjee concludes, “Our approach at least provides the best available roadmap of what is needed in future studies.”