The story of the masking of non-European genotypes in modern genomic studies

Author: Kate Hodgson
Editor: Sara Maria Majernikova
Photo Courtesy: Genome-wide association studies (GWAS)
Throughout history, health professionals have diagnosed patients based on their clinical symptoms and description. In recent years, genome-wide association studies (GWAS) have helped scientists understand the genetic factors involved in various diseases. As a result, researchers are investigating biobank samples from various populations to identify specific genomic loci associated with a wide range of medical indications and traits.
A genome-wide association study (GWAS) is a research technique used to discover genomic variants in the form of single nucleotide polymorphisms (SNPs) associated with the risk of a disease or trait. This technique compares genomes containing the trait to control genomes to determine which variants are significantly more common in those with the trait. GWAS focuses on finding correlation, not causation. This is a major drawback of the results, as they are only as valuable as the number and range of individuals studied. Despite its pitfalls, GWAS has become increasingly popular over the past two decades as a non-candidate gene-driven approach to finding genetic associations with health conditions. This popularity is largely due to the rapidly declining cost of sequencing the human genome throughout the twenty-first century, with the current cost of genotyping an entire genome now less than $50 USD. The first successful GWAS trial was published in 2002, and found associations between genomic variants and cases of heart attacks. With this publication, the popularity of the technique grew as researchers discovered ways of applying GWAS results to the clinical world.
Given that GWAS is a technique that focuses on genomic variants, it is vital to understand how our genomes differ across the planet. The story begins with the original population of Homo sapiens in East Africa. The founder effect is a phenomenon in which there is a loss of genetic diversity caused by a large population arising from a small number of individuals. Alleles in the population are more likely to be lost at a rapid rate due to the increased effect of genetic drift in the small population. This is exactly what occurred when our ancestors first left East Africa. As smaller populations left the original Homo sapiens population in East Africa to populate the rest of the world, heterozygosity (indicating genetic diversity) decreased with distance away from their origin due to the founder effect. Genetic diversity and distance from East Africa have a negative correlation, with Africa having the highest level of genetic diversity, and America (containing the most recent new world population) having the least genetic diversity. Due to the founder effect, humans outside of Africa lack many of the genetic variants found only in African populations.
Considering that researchers have confirmed that populations across the world differ in genomic diversity, it is sensible to think that GWAS studies incorporate diverse genomes within their studies. This is far from the truth. A study by Martin et al. in 2018 revealed that 80% of all participants are white Europeans, despite only contributing 16% of the world’s population. Predictions of genomic variants associated with phenotypic traits are not representative of alternate ethnic groups, amplified by the fact that white Europeans have less variation in their genetics. GWAS is used to calculate polygenic risk scores, which measure disease risk due to genetics. The study found that polygenic risk scores were far more accurate in Europeans than non-Europeans. Unfortunately, the diversity of GWAS participants has not been improving, with non-European participation stagnating since late 2014.
The lack of genomic diversity in genome studies has led to the omission of many key variants associated with disease risk. A famous example lies in cystic fibrosis. For many years, it was assumed that the recessive genetic disorder was not present in African populations, as diagnostic kits only searched for genomic variants associated with cystic fibrosis found in Eurocentric populations. Cystic fibrosis exists in African populations, only under the influence of a different variant. To prevent further misdiagnoses and understand the influence of genetics on phenotypic traits, the diversity of GWAS participants must be increased. There are examples of programmes that support sharing genomic data from multiple ancestral origins, such as the All of Us research program, but there needs to be a stronger incentive to expand GWAS participants to non-European origins as the research technique gains traction. Overall, this work provides groundbreaking results addressing relevant predispositions within previous GWAS, and the findings will allow researchers to examine human diseases through genetics with an unbiased eye.
