Sunday, May 20, 2012

Full Hands-On With Samsung’s S Voice From The Galaxy S III

Here's a detailed explanation of the voice recognition of the Galaxy S III phone....

S Voice is Samsung’s entry into the fledgling “virtual assistant” market currently occupied by Siri, Evi, Speaktoit Assistant, Vlingo, and a handful of others. The Galaxy S III rom leaked earlier today, and while most of the stuff in it is broken and completely useless without the version of Touchwiz it’s meant to run on, S Voice is a perfect combination of being interesting AND working. So we’re going to take a look at it on my Galaxy Nexus, which is currently running vanilla (well, AOKP) 4.0.4.

S Voice is only meant to run on the GSIII’s Touchwiz build, so I expect lots of force closes. It’s not meant to do this, so I won’t be too hard on it when it stumbles. But hey, this should be a great indicator of how well the intent handling is!

First though, we’ve got to install it (mirror #1, mirror #2):

wm_2012-05-19 20.48.07wm_2012-05-19 20.48.12wm_2012-05-19 20.48.23

And oh my lord, those permissions do not mess around. It’s totally justified though, S Voice needs all this because it can control just about everything on your phone.

wm_2012-05-19 19.58.40wm_2012-05-19 19.58.45wm_2012-05-19 19.58.49wm_2012-05-19 19.58.54wm_2012-05-19 19.58.58wm_2012-05-19 19.59.03wm_2012-05-19 19.59.06

After installation, S Voice immediately spills the beans with a “Hey, this is based on Vlingo!” Terms of Service page. Afterward you are presented with a “Voice assistant for dummies” walkthrough, which I have dutifully taken screenshots of. The one incredibly neat feature this informs you of is the ability to have a verbal “wake command” for the voice recognition. By default, “Hi Galaxy” will take the place of pressing the listen button.

It also informs you that “You can say ‘Hi Galaxy’ when in S Voice or locked screen to automatically wake up S Voice.” Now I’m not sure if “locked screen” means “with the screen on at the lock screen” or “anytime the phone is off” the latter would be really cool. Either way it’s a special GSIII only feature, “Hi Galaxy” only works in-app on my Gnex.

There is also an extremely scary “Note: Wake-up command may consume battery” message. So the microphone really is just listening all the time. (Where’s my tin foil hat?)

After the tutorial, you are presented with the full list of abilities for S Voice (any punctuation and capitalization weirdness is Samsung’s fault):

  • Voice Dial - “Call Charlie mobile”
  • Text Message - “Text Katie message are you free tonight for dinner?”
  • Search Contacts - “Look up James”
  • Navigate - “Navigate to Cambridge, MA”
  • Memo - “Memo Send mom a card”
  • Schedule - “New Event Lunch with James July 21st at 1PM”
  • Task - “New task Finish project”
  • Music - “Play playlist my favorites”
  • Social update - “”Twitter update” Why do humans live so far North?”
  • Search - “Search Bonobo apes”
  • Open App - “Open Calculator”
  • Record voice - “Record voice”
  • Driving mode - “Driving mode on/off”
  • Set Alarm - “Set alarm for 6:00 AM”
  • Timer - “Set timer for 1 minute”
  • Weather - “What is the weather for today?”
  • Simple settings controls - “Turn Wi-Fi on”
  • Get an answer - “How tall is Mount Everest?”
  • Local Listing - “Find restaurants”

wm_2012-05-19 19.59.402012-05-19 22.42.09

After the hand holding is over we are presented with with this screen and it’s time to get down to business! The little button on the left will mute the phone, meaning it will turn off the text to speech responses like when it reads your messages back to you, and the “?” button will bring up the list of actions.

So how does it work? Well, it’s important to keep in mind it’s not designed for this hardware, but man, the voice recognition is terrible. It’s no where near in the same league as Voice Actions. Hopefully on the GSIII things go a little smoother. For now we’ll just focus on S-Voice’s abilities.

For starters, this “Hi Galaxy” wake command is crazy sensitive, even if you casually bust it out mid sentence the phone will pick up on it and start recording. It’s awesome. You can even say “Hi Galaxy [command]” without waiting for it to respond and it will handle everything. It’s very impressive.

First up is Voice Dialing, which works great, but it isn’t very photogenic.  You say “Call bob” and it jumps into the dialer, so I really have nothing to show you a picture of. Next!

wm_2012-05-19 22.35.14wm_2012-05-19-22.48wm_2012-05-19 22.55.38

Text messaging sort of works, just like I said, the voice recognition isn’t so hot… I also don’t understand why, in the first image, it completely drops my “Hey” from “Hey I’m testing…” It detects it and then just forgets about it.

The usage flow is great though. Once you say your text, you are presented with a cancel/send dialog, which you can answer with a tap or verbally. You can specify name only (picture 3, which was supposed to be “Send text to Artem”) and it will respond with “What is your message?” It’s very cool.

Every left-pointing speech bubble you see in the pictures (except for “Say ‘Hi Galaxy’” and the initial “What would you like to do”) are read aloud with a Text To Speech engine. There’s no TTS built in, it uses whatever you have on your phone, which in my case is the dreary default Google TTS. Maybe the GSIII comes with something nicer.

Searching Contacts is totally busted, “Look up [contact name]” always does a web search, you probably need Touchwiz contacts for it to work. Navigation works great, but again, there isn’t much to show.

wm_2012-05-19 23.14.48wm_2012-05-19 23.33.10

I really can’t make this thing not say “Ass Voice.” I swear it isn’t me - I just don’t think anyone told it its name, they should really fix that. Voice recognition is just all around bad, it took about 4 tries to get anything close to recognizable.

Again we see it dropping words it previously recognized for no reason. Originally, it recognized “Don’t forget to make an ass voice post…” Then inexplicably it drops “Don’t forget to make” once it determines it’s a memo (with absolutely no input from me). I want my memo to be exactly what I said. The combination of bad recognition and editing makes this very frustrating. Again though, the usage flow is awesome, you can say new memo and it will ask what you would like to write.

wm_2012-05-19 23.40.13wm_2012-05-19 23.38.29

COOL! Calendar entries are scanned for conflicts. That’s pretty awesome. The voice recognition is so terrible though. This took about 15 tries. The “What time should I schedule this for?” won’t accept “All Day” as an answer, which is another problem. It’s also terrible at natural language. You can’t say “On June 27th schedule Google I/O” or really anything that deviates from the example. New event entries are the #1 thing I want from a voice assistant. Entering them by hand is tedious.

New Task entries look exactly like this.

wm_2012-05-19 23.52.39wm_2012-05-19 23.53.54wm_2012-05-19 23.58.06

Music force closes, which means we’re on to Facebook. It’s pretty startling how much easier voice is. The first time you do it, it prompts you to login and allow the Vlingo app, after that it’s smooth sailing. I doesn’t remember the initial entry that prompted the login though.

Again you can specify the command first and it will ask for the content. The third picture was supposed to be “I’ve got to stop using S Voice in my examples” which it somehow turned into “Hi back to stop you seen voice in my dimples.” That should give you an idea of how terrible the voice recognition is. It’s bad. It’s really bad. All these other pictures took several attempts to get something recognizable. I use voice actions daily with no problem.

Search brings up this crazy looking custom Google search. If you can get the Voice Recognition to cooperate, it works.

wm_2012-05-20 00.07.42wm_2012-05-20 00.12.30wm_2012-05-20 00.15.44wm_2012-05-20 00.18.36

Opening apps works, if it’s not sure what you meant it will even give you a list of available possibilities. Cool. “Record Voice” is broken.

If I say “Driving mode on” it responds with “Driving mode on.” Well then… glad that feature… works…? Timer and Alarm work great. They’re the exact same function. Wifi off is also pretty straightforward.

wm_2012-05-20 00.17.55wm_2012-05-20 00.17.10wm_2012-05-20 00.17.17

Weather is very pretty. If I didn’t already have it on my home screen and in my notification bar I would be impressed. Bizarrely worded questions like “Should I bring an umbrella?” even work and get you a simple “yes/no precipitation.” Protip, Samsung: sentences start with a capital letter.

Clicking the “more” button brings up Wolfram Alpha and WOW guys I really did not need the precipitation rate accurate to hundredths of an inch per hour. Calm down. I just wanted to know if it was going to rain.

wm_2012-05-20 00.31.11wm_2012-05-20 00.31.21wm_2012-05-20 00.31.31wm_2012-05-20 00.33.28

The answer service works great, and just like Siri, uses Wolfram Alpha for everything. For broader questions you get a little embed; for directly answerable stuff it just spits out the answer. Larry Page’s middle name is “Edward,” who knew?

wm_2012-05-20 00.34.59

“Find a restaurant” gives you nice little phone call icons for each entry. The default of TWENTY listed entries is a little much though, some pagination every 10 or so would have been fine.

Overall, the capabilities of S Voice are really great. The interface is nice and I really like the ability to issue a command first and then the parameters. The “Hi Galaxy” wake command is amazing and really a handy feature. The sensitivity of it is top notch, even something like “Hi Galaxy set timer for 2 minutes” is handled flawlessly.

The voice recognition though, is pure crap. When your vocabulary is limited, like with the baked-in commands, it’s good enough to work, but try and dictate a memo, or a calendar event title, and it will fall flat on its face. Getting screenshots for this article was very frustrating and took way longer than it should have because of all the recognition issues. It’s no where near the level of Google’s Voice Actions.

Still though, like I said in the intro, this isn’t meant to run on a Galaxy Nexus. We’ll have to see how it preforms on the hardware it was meant to run on. Recognition issues aside, this preformed surprisingly well for being freshly ripped from a leaked rom.

This hands-on was mainly meant as a look at the capabilities of S Voice, and they are pretty comprehensive. I really can’t think of something I’d want to do verbally that S Voice doesn’t support. If they can clean up the recognition issues, Samsung will have a really killer feature on their hands. We’ll just have to wait until the Galaxy S III launch!

Ron Amadeo Ron loves everything related to technology, design, and Google. He always wants to talk about the future and what’s next, and, in the case of Android, he’s not afraid to get knee deep in an APK for some details. Expect a good eye for detail, lots of research, and some lamenting about how something isn’t designed well enough.
google.com by Ron Amadeo
via http://www.speechtechnologygroup.com/speech-blog - Here's a detailed explanation of the voice recognition of the Galaxy S III phone.... S Voice is Samsung’s entry into the fledgling “virtual assistant” market currently occupied by Siri, Evi , Speaktoit Assistant , Vlingo , and a handful of others. The Galaxy S III rom leaked earlier today, and while ...

No comments:

Post a Comment