Identifying genetic differences between individuals previously concentrated on single-nucleotide polymorphisms (SNPs), single letter differences in a person’s DNA, which could be informative about a person’s disease or even his/her predisposition to a disease. However, more recently, it has been appreciated that each person’s genome also carries an enormous amount of structural variation- deletions, duplications, insertions, and inversions in the genetic sequence.
“There are many structural variants in everyone’s genomes and they are increasingly being associated with various aspects of human health” said Charles Lee, PhD, a clinical cytogeneticist at BWH and associate professor at HMS, and co-chair of this project. “It is important to be able to identify and comprehensively characterize these genetic variants using state-of-the-art DNA sequencing technologies.”
The genetic sequences of 185 individuals were generated by the 1000 Genomes Project and comprehensively analyzed for structural variants by 57 scientists from 26 institutions. Scientists quickly realized that conventional methods for detecting SNPs could not be applied to the identification of SVs and 19 new computer programs and strategies had to be developed and tested to more accurately identify SVs. “The study found that no one program could comprehensively identify SVs and that each program had advantages and disadvantages that in some cases complemented other analytical programs,” said Matthew Hurles, DPhil, of the Wellcome Trust Sanger Institute and co-chair of the project.
The study found a total of 22,025 deletions and 6,000 other structural variants. “We have been given our first glimpses of the complete spectrum of human genetic variation – from 1 bp indels to larger copy number changes,” said Evan Eichler, PhD, a Howard Hughes Investigator at the University of Washington and co-chair of the project.
The study also provided important insights into how SVs are formed in the genome, thus linking SVs to mutational processes acting in the germline. “We found 51 hotspots where SVs, such as large deletions, appear to occur particularly often,” said Jan Korbel, PhD, a senior author of this study from the European Molecular Biology Laboratory in Heidelberg, Germany. “Six of those hotspots are in regions known to be related to genetic conditions, such as Miller-Dieker syndrome, a congenital brain disease that may lead to infant death.”
Data from this project are being made publically available to the scientific community through the 1000 Genomes Project, which aims to sequence the genomes of 2500 people by December 2012. The resource will be the largest collection of whole-genome DNA sequences freely available to researchers. The data may be accessed from the 1000 Genomes Project Data Coordination Center, a collaboration between the NIH National Center for Biotechnology Information (NCBI) and the European Bioinformatics Institute (EBI), at www.1000Genomes.org.
“Identifying SVs from DNA sequencing datasets is very challenging and it is gratifying to see the incredible progress that the SV group has made over the past 2 years,” said Richard Durbin, PhD, of the Wellcome Trust Sanger Institute and co-chair of the 1000 Genomes Project. David Altshuler, MD, PhD, of the Broad Institute, also a co-chair of the 1000 Genomes Project, added, “I am confident that this map will serve as an important resource for future sequencing-based disease association studies.”
Organizations that have committed major support for the project include Illumina; Life Technologies; the Wellcome Trust Sanger Institute; and the NHGRI, which supports the work being done at Baylor College of Medicine, Brigham and Women’s Hospital; Boston College; Broad Institute; Cold Spring Harbor Laboratory; Washington University of St. Louis; University of California San Diego; University of Washington; and Yale University. Other institutions involved in this research include BGI-Shenzhen; Howard Hughes Medical Institute; Leiden University Medical Center; Louisiana State University; Max Planck Institute for Molecular Genetics; Mount Sinai School of Medicine; Roche; Simon Fraser University; Stanford University; University of Oxford; and University of Copenhagen.