How It Works: Speech Recognition
For years, speech recognition has been the poster child for technology that never lived up to its promise. Only three years ago, the products were expensive, inaccurate, and hard to use. That's changing. Fast PCs and ingenious software improvements mean that speech recognition technology finally offers real benefits. And it's appearing in places you might not have expected, including your mobile phone. Want to compose e-mail or surf the Web? All you'll have to do is talk.
Here's what you need to know:
A computer doesn't speak your language, so it must transform your words into something it can understand. A microphone converts your voice into an analog signal and feeds it to your PC's sound card. An analog-to-digital converter takes the signal and converts it to a stream of digital data (ones and zeros). Then the software goes to work.
While each of the leading speech recognition companies has its own proprietary
methods, the two primary components of speech recognition are common across
products. The first piece, called the
Here's how it breaks down your voice: First, the acoustic model removes noise and unneeded information such as changes in volume. Then, using mathematical calculations, it reduces the data to a spectrum of frequencies (the pitches of the sounds), analyzes the data, and converts the words into digital representations of phonemes.
For example, look at this sentence, which has been broken down into phonemes:
Now the second major component of speech recognition software,
the
Unfortunately, the English language complicates things. For example,
"there," "their," and "they're" all sound the same. A key to the power of
today's speech recognition is its use of
Speech recognition packages also tune themselves to the individual user. The software customizes itself based on your voice, your unique speech patterns, and your accent. To improve dictation accuracy, it creates a supplementary dictionary of the words you use.
Dragon Systems, IBM, Lernout & Hauspie, and Philips are the major speech recognition companies in the PC arena. However, on March 28 L&H announced an agreement to purchase Dragon Systems. The company says it will continue to offer both product lines for the immediate future, which means L&H products will account for a dramatic majority of speech recognition software sales. According to IDC analysts, Dragon Systems holds about 60 percent of the market, with IBM and L&H vying for second place.
Speech recognition's complexity pushes the limits of PC processing power. Although most packages will work with a 200-MHz Pentium, a 300-MHz or faster chip dramatically improves performance. New chips such as the Pentium III and the Athlon satisfy the applications' demand for power even better, and many high-end packages can take advantage of the PIII's multimedia extensions. And the more RAM, the better: Consider 64MB a practical minimum, with 128MB providing substantial improvements.
Most speech packages come with a basic headset microphone, but a better
one from a third party can improve recognition.
The quality of your PC's sound card is also crucial. Cheap models won't
cut it because they produce distorted, low-quality output. While standard
16-bit sound cards work, a high-quality card that costs $100 to $150 will
offer better performance. Or you could try
Most of today's speech recognition packages also allow voice control of many Windows applications (find out from the vendor which programs the recognition software works with). The packages usually do this by converting spoken words into the appropriate text or commands and sending them to the application.
Applications such as Word or Excel look for standard commands, and whether those commands come from a keyboard or your mouth doesn't matter. In addition, most speech recognition packages work with your browser, allowing you to "voice surf" the Web.
Voice surfing is just the start of what you'll be able to do. Dragon and L&H now offer portable digital voice recorders that download recordings to your PC when you get back to the office; your PC's speech recognition software transcribes your notes directly from the recorder.
Analysts say portable devices--such as Web-enabled mobile phones, which don't have standard keyboards--are next on the horizon. Rather than having full-fledged speech recognition, these devices will be tuned to a limited range of specific applications, such as getting stock updates.
For desktop PCs, the next major leap is three to five years away, when technologies such as natural language processing and artificial intelligence come to the consumer. Natural language processing analyzes the context of a word by looking at a whole sentence instead of a few words, resulting in greater accuracy.
Even more sophisticated (and perhaps frightening), artificial intelligence will allow computers to understand what you mean instead of just what you say. Speech packages will hold a discussion with you and will analyze the emotional aspects of your voice.





