“Spliceman takes a set of DNA sequences with point mutations and computes how likely these single nucleotide variants alter splicing phenotypes,” write co-authors Kian Huat Lim, a graduate student, and William Fairbrother, assistant professor of biology, in an “application note” published in advance online Feb. 10. It will appear in print in April.
Spliceman can be found at fairbrother.biomed.brown.edu/spliceman.
The software is based on research published last year in the Proceedings of the National Academy of Sciences, in which Fairbrother’s group used Spliceman to show that perhaps as many as a third of the disease-causing mutations in the Human Genome Mutation Database do so by causing errant gene splicing.
Splicing of RNA, based on instructions in DNA, is like a film editing process. A gene includes raw footage and instructions on how it should be edited to produce a protein. If the editing instructions are faulty, the scenes extracted from the raw footage may be spliced together in the wrong order or the wrong scenes might be used.
Each of a person’s 20,000 genes has about 20 splice sites. Sequences that regulate splicing often occur close to splice sites, and every possible “word” of DNA letters (e.g. AAA) has a signature distribution around the splice sites. But a mutation creates a new word. For example, an A-to-T mutation could change “AAA” to “ATA.” In a normal genome, if AAA encodes proper splicing, its average distance to the nearest splice site will be short and if ATA doesn’t encode proper splicing its distance would be longer. A mutation that changes a word close to splicing sites into one that is typically found far from splicing sites would be of particular concern because it could have a likely adverse effect on splicing.
“The bigger the distance, the more likely that it affects splicing,” Lim said.
Spliceman makes its predictions about mutations by calculating that distance. It has successfully predicted the known effect of many mutations.
The software has genomic information about 11 species: humans, chimpanzees, rhesus monkeys, mice, rats, dogs, cats, chickens, guinea pigs, frogs, and zebra fish.
Fairbrother said he has already heard from colleagues and medical researchers who have been eager to integrate Spliceman into their efforts.
“I think it will mostly be used by medical geneticists seeking to understand the cause of disease,” he said.
One use of the software, Fairbrother said, will be by a Harvard-based multidisciplinary team led by genetics researcher Shamil Sunyaev in this year’s Children’s Hospital Boston CLARITY challenge.
In the contest, competitors must discover the unknown genetic basis of rare disorders faced by three pediatric patients. Armed with the entire genome sequence of the patients and their parents, Spliceman will be used to interrogate discovered mutations and variants for their ability to disrupt splicing.
The National Institutes of Health has funded Fairbrother’s research.