Bethany Gooding
Half of the UK population now owns a smartphone, with 18.5% of those being iPhones (http://www.theguardian.com/technology/2011/oct/31/half-uk-population-owns-smartphone). This means a large number of people use Siri on a daily basis to construct texts, make phone calls, conduct Google searches and carry out other simple tasks. With the world's population becoming increasingly lazy, speech recognition software is being used a greater amount, making the accuracy of it a necessity.
To understand fully the meaning of language presented to it, Siri must have domain knowledge, discourse knowledge and world knowledge (Stanford Natural Language Processing), making it unsurprising that errors are sometimes made in analysis. Apple's Siri, according to MacWorld, is almost never wrong as it uses Natural Language Processing to analyse the syntactical structure of speech, extracting nouns, adjectives, verbs and intonation.
However, Rene Butler in their report 'Improving Speech Recognition through Linguistic Knowledge' argues that speech recognisers are inaccurate as they pay no attention to the linguistic structure of speech. One such model for speech analysis, the N-gram model only looks at a few proceeding words, which makes interpretation inaccurate. We as social animals are able to understand language as we have prior knowledge of syntax and semantics and we are also able to use exophoric clues in our situation. However, Siri is unaware of the situation and so must base its understanding purely on the linguistic features of speech.
To communicate with speech recognition software, we must adopt a controlled natural language which has a precisely defined syntax (Controlled natural language in speech recognition based user interfaces-Kaarel Kaljurand and Tanel Alumäe.) Language is largely ambiguous, making this controlled natural language different from written input due to linguistic features such as homophones, abbreviations and punctuation making spoken input sometimes difficult to distinguish and understand. To improve this understanding, when communicating with, for example Siri, we must allow for synonymous grammar which avoids ambiguity such as distinguishing between Hailsham town and Hailsham road.
As well as the theory of language being necessary for Siri to understand us, the software must also be able to understand different accents. Pronunciation of phonemes and morphemes varies according to the region the speaker originates from, sex and speaking style. This increases the knowledge Siri must have to turn the phonemes in a speech signal to a textual representation of the language.
With technology becoming a larger part of people's lives each day and individuals relying on this an increasing amount, speech recognition may soon be the way that people control not just their phones but many other household objects. Therefore, these devices must integrate linguistic models into their analysis of speech to make this as accurate as possible. Are you able to use Siri for all that you need on your phone? What errors do language models need to improve?
No comments:
Post a Comment
Note: only a member of this blog may post a comment.