Do I have the gene sequence responsible for my mother’s cancer? Will I be as tall as my uncle? Will my children have fair hair, or dark? Will they have dimples? Who in my family might develop asthma or diabetes?
Unfortunately, there are few clear-cut answers, particularly in ethnic groups or populations whose genomes have not been well-studied. This is a particular problem when trying to translate genomic information into clinical applications and medical recommendations.
Researchers at the Stanford University School of Medicine and around the world are still working to understand which variations among the billions of nucleotides in our DNA are functionally important, and which are merely the result of the genetic noise inherent in the messy process of human reproduction.
Now, the international 1,000 Genomes Project has catalogued more than 38 million single-nucleotide variations, called SNPs, and several million other genetic changes in over 1,000 people from 14 populations in four main geographic regions: Europe, Africa, East Asia and the Americas. The results were published Oct. 31 in Nature.
“Studies like these will directly benefit some of our most medically underserved populations, while also helping all of us by increasing our knowledge of naturally occurring genetic variation,” said Carlos Bustamante, PhD, a professor of genetics at Stanford. “There’s a real need for these types of tools and resources.”
In particular, the researcher found that the relative prevalence of rare and common variants can differ widely between specific populations. For example, a rare genetic change (defined as those found in fewer than 0.5 percent of samples) linked to disease in one population is not likely to occur in another, and even more-common genetic variants (occurring in 1 to 2 percent of samples) may not be shared among populations in different parts of the world.
Bustamante is one of the principal investigators of the study. Gil McVean, PhD, a professor of statistical genetics at the University of Oxford, is the senior author. More than 100 institutions and hundreds of individual scientists are listed as co-authors on the paper.
Bustamante and colleagues urged the inclusion of populations from the Americas — including Puerto Ricans, Colombians, African Americans and Mexican Americans — during the analysis of data from the pilot phase of the project that focused on Asia, Europe and Africa.
“We felt it was very important to include these traditionally underserved populations,” said Bustamante, noting that disease incidence can vary widely among what appear to be similar groups.
As one example, he pointed out that collaborator Esteban Burchard, MD, professor of biopharmaceutical sciences and medicine at University of California-San Francisco, has shown that Puerto Ricans have a high incidence of asthma, while Mexicans have a low incidence. “Normally we would think of both these groups as ‘Hispanic,’ but they are on the extreme ends of the asthma spectrum,” Bustamante said. “Similarly, African Americans have a much higher incidence than white Americans of certain kidney diseases and diabetes. Now we can begin to delve deeply into the genetic causes of disease.”
Bustamante led a working group of the 1,0000 Genomes Project that developed computer algorithms to study a genetic phenomenon called admixing in which two or more formerly separate populations begin to mix. They combined the results of multiple algorithms that allow a deconstruction of an individual’s ancestry based on recent genetic contributions. They’ve termed the technique “ancestry deconvolution.”
“We can see where an individual might have recent European ancestry versus African ancestry,” said Bustamante.
The new study builds upon the 2010 results of the 1,000 Genome Pilot Project, which sequenced the whole genomes of 179 people in an attempt to identify all variations that occurred in at least 1 percent of the group. The new data in this study provides a more comprehensive examination of such so-called rare variants by combing low-coverage whole-genome sequencing (in which every nucleotide is sequenced only about five times, as compared to up to 100 times in high-coverage studies) with more in-depth analysis of specific portions of the genome.
Like many of the other researchers in the project, Bustamante and his colleagues plan to continue their participation in the effort. “For the next phase, the 1,000 Genomes Project has collected and is sequencing DNA samples from Peru and Barbados,” said Bustamante. “This is a really important collection, and it will provide many additional resources for other research groups.”
In addition to Bustamante, other Stanford researchers who participated in the study include postdoctoral scholars Jake Byrnes, PhD (now at Ancestry.com), Simon Gravel, PhD, Eimear Kenny, PhD, Jeffrey Kidd, PhD (now at the University of Michigan), and Andres Moreno-Estrada, MD, PhD; research associate Phil Lacroute, PhD (now at Natera); graduate student Brian Maples; and former graduate student Fouad Zakharia, PhD (now at Citibank).
The 1,000 Genomes Project is funded by multiple sources throughout the United States and internationally. Bustamante’s portion of the work was funded by a grant from the National Human Genome Research Institute to him and his collaborators at Cornell University to develop methods for analyzing data from the project.