How many geneticists does it take to sequence a genome? The answer, it appears, is changing all the time. Next-generation sequencing machines decode organisms’ DNA at ever faster rates and scientists are reacting by reorganising their work architecture, all the way from practise to publication.
And it’s real-life, real-time problems that are catalysing changes in the ways that scientists solve problems and publish solutions. Take, for example, the outbreak of Shiga toxin-producing E. coli 0104:H4 that hit Germany in May 2011; this, the worst outbreak of its kind on record, mobilised researchers across biological disciplines to find the microbial source of this deadly epidemic.
By the end of July, up to 50 deaths were attributable to haemolytic uraemic syndrome (HUS), a serious complication resulting in acute injury to the kidney and, on occasion, the brain and pancreas. In addition, more than 3000 people became sick and required significant hospital treatment, including blood transfusions and intensive care (Ref 1).
In this outbreak, epidemiologists eventually tracked the outbreak to fenugreek seeds imported from Egypt. In the meantime, BBSRC-funded scientists Professor Mark Pallen and Dr Nick Loman from the University of Birmingham kickstarted a ‘crowdsourcing’ effort to analyse the genomic data from the outbreak strain.
Crowdsourcing involves releasing information – which can be anything from spreadsheets to images to genomic data – into the public domain and asking volunteers (who may still be professionals in their fields) to analyse for a specific purpose. Examples include Nasa asking astronomers to find interesting particles in images from the Stardust probe, UK newspapers releasing files on MPs’ expenses, and the diplomatic cables released by WikiLeaks.
During the 0104:H4 outbreak, traditional scientific publishing conventions were ignored as genomic data from relevant bacteria were released to the international bioscience community for analysis before being published in a peer-review journal (Ref 2).
Within 24 hours of data release by the Chinese genomics centre BGI-Shenzhen, Loman had assembled the data and called on others to help analyse it. Two days later it had been assigned to an existing class of bacterial sequences type. It took just five days to prepare and release strain-specific primer sequences to diagnose the bug, and within a week more than 20 reports were freely available for analysis.
“The main point is that the various analyses were completed and released into the public domain, including to the public health physicians and news media, more quickly than they would have been the case using a traditional approach, where all analysis would have been done in-house with data release only on publication,” says Pallen.
“The process has many advantages, including speed and tapping into the wisdom of the crowd – that means drawing on people with different knowledge, skills, attitudes, who are happy to question assumptions. In effect, you get real-time open peer-review of all that is going on,” he says.
The crowdsourcing was itself catalysed by the use of online social media, such as Twitter, blogs and a Wiki (an online repository that users can edit from anywhere). “Social media enables smart people to communicate plans and data much more freely and easily than traditional approaches,” says Pallen.
An Ion Torrent Personal Genome Machine as used by Pallen’s team. Image: Nick Loman
Of course, there are disadvantages to this approach too. “It’s hard to sort those who bring genuine insight from those jumping on the band wagon,” says Pallen. “It’s hard to gain credit for such activities via traditional academic systems, in particular, it’s not easy to work out how to get these kind of analyses into peer-reviewed high-impact research publications, although fortunately we did manage that here”
This outbreak also saw adoption of a wide range of sequencing technologies, including cheap compact benchtop instruments, accessible to the average research lab or department such as Ion Torrent’s Personal Genome Machine. “This works out cheaper in terms of equipment and running costs than any previous next-generation instrument and has a shorter run time – three hours versus more than eight hours for 454 technology, or several days for Illumina machines,” says Pallen. “However, we can expect lively competition between several competing platforms in the future, including Illumina’s MiSeq platform.”
He notes too, that even though the 0104:H4 E. coli strain has probably been sequenced on more different platforms than any other organism, there is as yet no complete and final sequence. “In one sense the analysis is still not complete, as there is no closed circular chromosome.”
So what did the different teams around the world find? The analyses provided information on the 0104:H4 strain’s virulence, resistance genes, and its phylogenetic lineage that appeared to be related to progenitor strains (the enteroaggregative pathotype strain 55989) which have been reported from three continents: Korea in 2005, Germany in 2001 and the Central African Republic in the 1990s (Ref 2).
Image: Mark Ward
The research also revealed a clutch of potential adhesins that the bacteria potentially use to stick to foodstuffs and the human gut, although pinpointing their exact role in the pathology of the organism will take further research.
Overall, whether crowdsourced or not, it remains unclear whether genomic analyses can deliver the speed necessary to help patients during ongoing outbreaks. And, as Pallen and Loman note in a recent musing, there is no simple path from genome sequence to an understanding of virulence, resistance or transmissibility (Ref 2). Yet, one hopes that Sun Tzu’s edict “know your enemy” rings as true for future deadly bacterial outbreaks as it does on the battlefield.
tel: 01793 413329
fax: 01793 413382