Google Voice Failures: Lost in Transcription
I've been using Google Voice for a few years now, since the days before Google bought the service and it went by the name Grand Central. One of the service's best features is its ability to transcribe voice messages and send them to your inbox, making it easy to keep up with incoming calls without disrupting what you're doing at any given moment. If those transcripts actually made sense most of the time, the service would be phenomenally useful.
Unfortunately, making heads or tails of Google Voice transcripts often requires a lot of guesswork, since the text I receive frequently bears little resemblance to the original message the caller left. In most cases I can at least see who called and get a vague sense of what they wanted. But at least a third of the transcripts I get are so riddled with errors that even the caller's identity is a mystery.
Here are a few examples of Google's handiwork. See if you can guess what the caller is saying, and then click the audio stream below the transcript to hear the actual message.
This first message is a pretty typical mix of accurate transcription combined with utter nonsense. Sure, I understand that I'm being invited to a barbecue, but I have no idea what "God for something like that" is supposed to mean, much less "Lots of there and like a giant bolt lock only style." Without listening to the voice recording, I'd be hard-pressed to respond appropriately to this message.
Fortunately, Google uses different shades of gray to indicate its confidence in the transcription. As a general rule, the more gray you see, the less sure you should be of the content. Of course, when the text is basically gibberish, that's a pretty good indicator too.
When the caller mumbles at all, the challenge is even greater, as in the case below.
"Wholesale of the alright"? I have to admit that I'm not 100 percent certain what that message actually said, but I'm pretty sure Google's guess was way off the mark.
Interestingly, longer messages tend to do a little better than shorter ones, perhaps because they give the service more opportunities to hit the mark. In the message above, Google gets a fair amount of the text right. But without hitting the keyword brewery and a reasonable approximation of the name of the beer my friend asked about, there's absolutely no context from which to guess at the content of the message.
I suppose I could chalk mistranscriptions like "pays love the buried" and "the vitamin of there" up to my buddy's accent, but this message actually fares much better than many from people with typical California "TV Land" accents.
"Hit the macksey on?" Not sure what that means, but I have been meaning to catch Inception. Even without listening to the message, I'd be willing to wager that this caller--who is a colleague of mine (I use Google Voice as my main business number)--isn't actually inviting me to hold her during the show. Misconstruing that voicemail transcript could lead to a meeting with our HR director.
The message above is a pretty reasonable example of Google Voice at its best. I get what the caller is asking about without really reading the message, despite the fact that many of the actual words are wrong. Similarly, I can see at a glance that the call below is an invitation to go rafting.
Yep. After listening to the message, it's clear that the text is mostly right. But what makes this transcript interesting is that it seems to reveal something about Google's transcription algorithm. After the caller mentions that she's going rafting on Sunday, Google Voice hears a word--"Yosemite"--that sounds vaguely similar to "Sunday," so the service appears to have assumed that it's the same word. This sort of contextual relevance pops up frequently in Google Voice blunders, occasionally creating contextually coherent sentences that have nothing to do with the real message the caller left.
Next: Same message, different transcripts--and how you can improve your results.