The first talking and typing programs were pretty frustrating, a lot of corrections and not much output. Things got better over time, programs like Dragon naturally speaking changed the way that they translated the spoken word and other programs came around to use more processing power to deliver better speech recognition. Recently with the advent of Apple’s Mountain Lion operating system and Google’s speech recognition service, dictating to your computer has become much easier.
I decided to give this new technology, or at least recently updated technology, a try so this article is all dictation done with Apple’s software built into the Mountain Lion operating system. My biggest difficulty so far is that I can’t think of anything to say.
Writing an article by talking instead of typing is not my normal mode of creation. I like to think about what I’m typing and a lot of writing, frankly, happens in rewriting. However, so far I’ve only had to make two corrections. This could really start to grow on me.
How they do it
Anyone who’s been watching the tech news lately will know that some people are up in arms about Apple’s privacy policies concerning the use of it’s vocal assistant Siri. While certainly not perfect, Siri has some major advantages. Using Siri in the car is one of the greatest inventions of all time in my estimation. I can ask for a local phone number as I drive along, I can ask how far we are from our destination, I can even check the weather, all with my hands never leaving the wheel or my concentration from the road.
The newest Apple speech recognition built into mountain lion isn’t like Siri, your machine does all the translating using Apple’s software licensed from Nuance (makers of Dragon naturally speaking). The reason it’s so accurate is due to better interpretation technology. The reason Siri works so well is that your voice is uploaded on the fly to a huge data center for transcription. Bigger machines mean better translation of the spoken word. Google’s speech to text works the same way.
For privacy advocates, that represents a huge issue. Everything you say is being forwarded to the Apple or Google data centers, worked on by large powerful machines and then stored for future reference so that it becomes more accurate the more it’s used. This means Apple has a record of every word I’ve spoken to Siri. I’m not sure I’ve ever related any embarrassing information to my phone but would be interesting check those records myself. Until I lost consciousness due to boredom that is.
Another big difference between these talk-to-text systems and those tried in the past is that words do not appear on the screen while you’re talking. I always found that incredibly distracting. I used to spend most of my time thinking about the mistakes being made instead of what I was trying to write/say. On Mountain Lion I can go for about 30 seconds before the speech buffer is full, or I can stop anytime by pressing the return key. Both the Google system and the Apple system wait until you are finished before trying to translate what you said.
The real deal
Both Apple and google have spent years perfecting this technology. Apple has had voice assist to control the mac since the 90’s. Google search using voice came out about 2 years ago. The technology has now matured to the point that’s it’s now really convenient, and almost hassle-free.
Although we all expected this moment to come long ago, it’s still nice that it has finally arrived. Speech recognition is available to everyone for little or no cost. Google doesn’t offer a full voice input for all it’s applications yet, but that is sure to come. Apple’s implementation is system wide. Every application that accepts text input can use the speech recognition built into the operating system. It’s also available on iPhones and iPads. Google offers the service on it’s Nexus phone all you have to say is “Google” and that starts the service. Most android phones (and iPhones) have the Google voice app as well. For a start, this means less danger from texting while driving which is a very welcome change. It also means that people who are disabled or just aren’t very good typists will now be able to compose essays, books, letters or just emails to their heart’s content.
This is a good thing.
When you post someone else's work it polite to credit them. I'm thrilled that you liked the blog entry enough to re-post it. But you think you might put where it came from?
ReplyDelete