Work carried out by Dr Per Ola Kristensson of St Andrews’ School of Computer Science using “crowd sourcing” and social networking sites such as Twitter, could result in synthetic voice systems, such as that used by Stephen Hawking, becoming faster and easier.
Crowdsourcing is a relatively new method of obtaining large amounts of statistical data through social media, but usually asks subjects to do simple tasks rather than use their creativity.
Together with colleague Dr Keith Vertanen of the Department of Computer Science at Princeton University in the United States, Dr Kristensson used the online sites to create a unique dataset that provides predictive text more like real speech.
Previously only small amounts of data were available from users of Augmented and Alternative Communication (AAC). Such AAC devices enable users with communication disabilities to participate in everyday conversations.
These speech devices rely on statistical language models to improve text entry by offering word predictions. These predictions can be improved if the language model is trained on data that closely reflects the style of the users’ intended communications. Until now these was no large open dataset of AAC messages available.
However Dr Kristensson’s work at St Andrews, with Dr Vertanen, “The Imagination of Crowds” published by the Association for Computational Linguistics, demonstrated how “crowdsourcing” can be used to create a large set of fictional messages.
He revealed the work, funded by the Engineering and Physical Sciences Research Council, was sparked by his interest in online sites dedicated to sourcing information from the public such as Amazon Mechanical Turk. This site uses online volunteers to carry out simply tasks which computers can’t do, for instance transcribing scanned documents or rating the quality of photographs. However, the tasks are often very simple and Dr Kristensson wondered if there might be greater potential.
He said: “We wondered if we could also use these services to harness the creativity of the crowd. Can we design a task for these services that provides us with a large surrogate dataset of AAC messages?”
The initial collection of crowdsourced messages was then expanded by intelligently selecting similar sentences from Twitter, blog and Usenet data. The end result is a dataset much larger and of higher quality than anything that had been used by researchers in the past.
Compared to a model trained only on telephone transcripts, the researchers’ best performing model was able to reduce the keystroke a user needed to use in a predictive keyboard by 5-11 per cent, which translates into faster conversations.
Dr Kristensson and Dr Vertanen have now publicly released the data collection, word lists and best performing models for free. The hope is to use these models to design and test new interfaces that enable faster communication for users with communication difficulties.
Dr Kristensson said: “The work demonstrates that we can tap the creativity of users of social media and crowdsourcing technologies to help improve the lives of people unable to speak. Without the new web technologies it would not have been possible to collect this dataset.”
Note to Editors
- Is the dog friendly?
- Can I have some water please?
- I need to start making a shopping list soon.
- What I would really like right now is a plate of fruit.
- Who will drive me to the doctor’s office tomorrow?
- I am cold, is there another blanket?
- How did Pam take the news?
- Bring the fuzzy slippers here.
- Why are you so late?
- I am pretty hungry, can we go eat?
- I had bacon eggs and hashbrowns for breakfast.
“The Imagination of Crowds: Conversational AAC Language Modeling using Crowdsourcing and Large Data Sources”, published in the Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing by the Association for Computational Linguistics.
Issued by the Press Office, University of St Andrews
Contact Fiona MacLeod on 01334 462108 / 0771 414 0559.
Ref: (crowdsourcing 28/09/11)
View the University’s latest news at http://www.st-andrews.ac.uk/news/