Friday, June 29, 2012

Study concludes that Siri's speech recognition accuracy is almost 90%

Despite some criticism by Siri users, it's speech recognition capabilities are actually quite good. Were Siri still has to improve is its ability to execute properly on the query........         

A new report about Apple Inc. (AAPL)’s voice-recognition software Siri concludes what many users have been saying for a while: It doesn’t work all that well.

Of 1,600 common searches, the speech technology accurately resolved the request 62 percent of the time on a loud street and 68 percent in a quieter setting, according to a report released today by Piper Jaffray Cos., the Minneapolis-based investment bank. The report graded the technology “D” for accuracy, while predicting it will improve as more features are added.

“You’re playing the lottery when you’re using Siri,” said Gene Munster, the Piper Jaffray analyst who conducted the study. “They have a plan to be more competitive, but it’s going to take a couple of years.”

Apple has made Siri the defining characteristic of its iPhone 4S, spending heavily on advertisements featuring actors such as Samuel L. Jackson and Zooey Deschanel. The ads have contrasted with the experience of many users, who have taken to customer forums and websites to complain that Siri doesn’t work as well as it’s being marketed. One group of customers even filed a class-action suit against Apple for false advertising. Apple has denied any wrongdoing.

Apple continues to build features for Siri. Earlier this month, the company said its new iOS 6 mobile operating system will support sports statistics and movie listings. As more applications like that are added and the accuracy is improved, the speech-recognition technology will displace searches that many users now perform through Google Inc. (GOOG)’s search engine.

Verbal Shopping

“Apple would love nothing more than to take that away from Google,” Munster said. He expects commerce features to be added eventually so people can shop by speaking out commands.

Munster said that while Siri is good at comprehending what a user is saying and will accurately repeat the question, it struggles turning those words into a correct answer. For instance, Siri will repeat old answers when a user is trying to ask a new question. The technology also struggles when trying to use speech commands to find directions, Munster said.

In Piper Jaffray’s tests, Siri was able to accurately decipher what a user was saying 83 percent of the time on the street and 89 percent in an area with low noise.

“Apple right now gets a ‘B’ in comprehension and a ‘D’ in accuracy,” Munster said. “There’s a big difference between comprehension and her actually doing what you want her to do.”


Apples Siri Gets Below-Average Grade by Piper Jaffray

A customer demonstrates the Siri function on an Apple Inc. iPhone 4S smartphone outside the company’s store at Covent Garden in London. 

via http://www.speechtechnologygroup.com/speech-blog - Despite some criticism by Siri users, it's speech recognition capabilities are actually quite good. Were Siri still has to improve is its ability to execute properly on the query........          A new report about Apple Inc. (AAPL) ’s voice-recognition software Siri concludes what many users have b ...

Tuesday, June 19, 2012

5 'Evil Siri' Videos That May Make You Chuckle | PCWorld

Speech technology with an attitude..... 

You’re likely familiar with the wealth of online content that pokes fun at Apple’s Siri, the virtual personal assistant feature on the iPhone 4S. There’s even a genre that imagines the dark side of Siri, nodding to the Terminator mentality that someday the world really will be taken over by computers. Check out these fun videos that portray Siri as a dark force -- even Apple fans have to admit the concept can be humorous.

Siri Wants to Kill Zooey Deschanel and Take Over Her Body

You’ve likely seen the recent Apple commercials that show how brilliant, useful and generally awesome Siri is, making use of talent such Samuel L. Jackson, Zooey Deschanel, and John Malkovich. Now the jokesters over at the Conan O’Brien Team Coco site have posted a video that parodies one of them. In their version, Siri is more resentful than helpful. Check it out:

Psycho Siri

If you missed this one when it was uploaded to YouTube back in February, you might like the special effects that portray Siri as a real deal killer. Be warned, however, the acting is atrocious. Even so, more than a million people have viewed this short film.

Siri: The Movie Trailer

This spoof put out by the website Rooster Teeth also depicts Siri as a murderer, but the acting is quite a bit better and Siri does some wickedly funny things, such as drag a guy by his shoe and use a “knife app” to stab someone else and take over his body. Check it out: "Siri: The Movie (Trailer)"

Siri Has a Dark Side: Hilarious Answers to Strange Questions

This video -- one of zillions you can watch on YouTube that show people getting Siri to say all sorts of funny things -- came out way back in October but will likely give you a good chuckle if you haven’t yet seen it.  The prodder tells her iPhone she wants to jump off a bridge and needs to hide a body. Will Siri oblige?

Siri 2: Judgment Day

Finally, this corny and low-budget video elucidates exactly why “Evil Siri” videos are so popular -- they get at the root of a secret fear many people have that computers can be too smart.“Think we’ll make it out of this alive?” asks the young woman actor in this video. Her costar, British comedian Graham Oakes replies, “Well, we’re up against this sentient homicidal artificial intelligence the likes of which the world has never seen before, but despite that, yeah, I think we should be fine.”

 

via http://www.speechtechnologygroup.com/speech-blog - Speech technology with an attitude.....  You’re likely familiar with the wealth of online content that pokes fun at Apple’s Siri, the virtual personal assistant feature on the iPhone 4S. There’s even a genre that imagines the dark side of Siri, nodding to the Terminator mentality that someday the wo ...

What the Apple and Facebook integration means

With Apple's integration of Facebook into iOS you can now use voice recognition to dictate your Facebook status updated....

facebook apple250 What the Apple and Facebook integration means Facebook has taken a big bite of the internet with its close to nine hundred million user base, and recent developments seem to show that the social media network is nibbling into Apple as well. This is because Apple recently announced that Facebook will be deeply integrated into the new iOS 6 and it seems that these two giants are getting quite cozy. Rumours had swarmed the industry that these two companies were making serious union plans, but now finally, the bed is made and these two are showing each other some love.

Apple is already integrated with Twitter

In the past year, Apple had integrated Twitter into its iOS 5 and some had speculated that there were deep-rooted tensions between Apple and Facebook. In December, Facebook released statistics which indicated that it had close to 900 million users and that more than 50% of its users accessed it through their mobile devices. With such huge numbers, Apple just had to send a friend request and luckily, Facebook accepted.

Seamless sharing 

The new integration means that iPhone users who are Facebook fans will have an easier time enjoying their two favourite things. There will be no more persistent login into Facebook, especially when the user goes from one app to another. What this means is that the path to Apple and Facebook is now a one way street, or better yet, a corridor, and there are no doors leading in and out of the social network. iPhone users will now be able to post on to Facebook without re-login, and they will be able to use Apple apps with Facebook easily.

Integration with native iOS apps

Facebook will be deeply integrated in iPhone applications such as the calendar. Therefore, Facebook events and even friend’s birthdays will automatically show on Apple’s calendar. Also Facebook users will be able to use the maps app on Apple to directly post their location at a particular time. They will also be able to use the photo app to upload pictures directly on to Facebook. Furthermore, they will be able to see what their friends have liked on iTunes as well as which apps on the app store their Facebook friends like or use.

Facebook and Siri link up

There will also be integration between Facebook and Siri and the user will be able to dictate these posts using Apple’s Siri. The notifications from Facebook will be included in the iPhone’s notification center. With such intimacy going on between these two strange bedfellows, some have wondered whether Twitter has been kicked out of the room. However, Apple has announced that Twitter and iOS are still friends but that Facebook will be more deeply integrated into the mobile phone devices.

Privacy Concerns? I don’t believe so.

Some analysts have wondered what this new-found love will mean for both iOS and Facebook. Some have argued that there will be a lot of privacy issues as these two companies are on opposite ends of the divide on this issue. Apple is big on protecting the privacy of its users and it zealously guards users’ content. Facebook on the other hand is all about sharing and it only leaves the decision on who actually gets to see what has been shared to the user. Analysts have therefore concluded that Apple kissing up to Facebook could leave an awful taste in the mouths of Apple users and this could hurt its business significantly. These two companies will have to develop policies that help to create a balance so that both iOS users and Facebook users will be happy.

Over-sharing? No – sharing is not automatic.

Other analysts have argued that the integration could hurt Facebook because people will forget that they are sharing. iPhone users will forget that they are logged on to Facebook and this could lead to over-sharing of information. However, if they do not remember that they are actually on Facebook, they will not take part in other Facebook activities such as liking, sharing or commenting. Therefore, it will be similar to millions of people shouting through their iPhones and nobody actually listening. In the end, the social media networks that are part of this integration will be affected.

Overall this will be a positive update and further strengthen the Facebook/ Apple relationship

However, there are several analysts who are optimistic about this union as they say that a marriage between two powerhouses can only result in good things. Only time will tell whether this merger will be well received by users and whether their honeymoon and union will last.

via http://www.speechtechnologygroup.com/speech-blog - With Apple's integration of Facebook into iOS you can now use voice recognition to dictate your Facebook status updated.... Facebook has taken a big bite of the internet with its close to nine hundred million user base, and recent developments seem to show that the social media network is nibbling i ...

Apple’s Siri is becoming a better conversationalist - CNN

Is speech recognition technology ready for real conversations?
Actor John Malkovich talks to his iPhone 4S in a current TV ad for Apple.

You’ve probably seen the new Apple TV ads with actor John Malkovich having what looks like the most charming chat of his life with Siri, the voice-activated “personal assistant” on the iPhone 4S.

“That’s pretty spectacular advice, actually,” Malkovich says after Siri tells him to avoid fat, read good books, take walks and “live together in peace and harmony” with everyone.

“I enjoyed this chat immensely,” he continues in his familiar soothing-creepy voice. “You are very eloquent.”

To many other iPhone 4S owners, however, Siri isn’t such a scintillating conversationalist. She oftens fails to understand what’s spoken to her, and many of her responses are little more than lists of Google search results. Disappointed iPhone users have even filed a class-action suit against Apple, claiming that Siri doesn’t work as well as advertised.

Fortune: Siri’s father comes to her defense

That may be about to change, however.

In the eight months since she debuted in October, Siri has been “studying up and learning a lot more,” Apple Senior Vice President Scott Forstall said during a presentation Monday at Apple’s annual developers’ conference. Forstall then previewed some advancements to Siri that will come this fall as part of iOS 6, the company’s next mobile operating system.

Apple is equipping Siri with new databases of knowledge, including the ability to retrieve sports scores. She’ll also be able to open an app for you, search movie showtimes, post Facebook updates, make restaurant reservations and provide turn-by-turn navigation to drivers with Apple’s new maps application.

But based on the glimpses we saw Monday, the most interesting improvement to Siri may be the language she uses in her answers, which already sound more natural and conversational.

For example, Forstall demoed Siri onstage by asking, “Who is taller: LeBron or Kobe?” (For the NBA stars, no last names were required, apparently.) Instead of directing him to a Web search or maybe pulling up info on Japanese beef, Siri answered without hesitation: “LeBron James appears to be slightly taller.”

Jason Gilbert, writing for the Huffington Post, called this exchange “the most important thing that was said on stage” over the course of Apple’s 90-minute event

via http://www.speechtechnologygroup.com/speech-blog - Is speech recognition technology ready for real conversations? Actor John Malkovich talks to his iPhone 4S in a current TV ad for Apple. You’ve probably seen the new Apple TV ads with actor John Malkovich having what looks like the most charming chat of his life with Siri, the voice-activated “perso ...

Mobile Speech Recognition in Business Applications | SYS-CON MEDIA

Speech recognition has changed a lot for mobile devices.....

The powerful mobile computing platforms of today 'SmartPhones' have so little in common with “phones” of yesteryear and in fact with the introduction of Tablet Devices into the mix, we should start calling them Converged Mobile Devices.  While processor, battery and software technologies have taken leaps forward over the past couple of years, there is still one thing small mobile devices can’t be.  Big. And herein lies the rub, because a lot of the perceived limitations imposed by form factor are really just side effects of trying to make CMDs act like little tiny laptops.  Leveraging speech recognition in mobile business applications could be one way to enhance the power and functionality of mobile devices.

Many mobile devices already run speech recognition programs that allow users to use voice features for navigation and search. In fact there are a number of mature applications that enable voice interaction with mobile devices.  And there are plenty of practical reasons for enterprise mobile solution providers to start looking into speech based UIs sooner rather than later.  For starters, forty seven states have enacted distracted driving laws, some with very stiff penalties. They have good reason. A recent CDC survey reports that 52% of U.S. drivers ages 18-29 admit texting or e-mailing while driving at least once in the last 30 days, and more than a quarter report texting or e-mailing “regularly” or “fairly often” while driving. So first off, building speech based UIs into apps designed for workers who spend a lot of time behind the wheel is an easy way to help encourage compliance with corporate distracted driving policies.

There are other reasons that speech based interfaces make really good sense on mobile devices, and these have to do with the science of speech processing and the kind of use cases in which it works best. 

Historically, one of the strong application areas for speech recognition has been in healthcare settings, and this is still true.  Most speech UI equipped healthcare applications have been successful in large  part because a speech recognition engine functions best after it learns a particular person’s (or limited group of people’s) vocabulary, intonation and accent.  Individualized medical charting systems and personal mobile devices have something in common which tends to eliminate this problem: They are virtually always listening to the same person, so recognition becomes accurate quickly.  In addition, mobile phones are optimal speech recognition hardware right out of the box.

For certain kinds of business applications, mobile apps with speech recognition can provide a more efficient way to perform certain kinds of mobile tasks.




via http://www.speechtechnologygroup.com/speech-blog - Speech recognition has changed a lot for mobile devices.....

Voice recognition software helps with MU, doc says

How voice recognition helps lower cost the healthcare industry.....

SAN DIEGO – Voice recognition software has provided the means to lower transciption costs, speeding efficiency and populating data for achieving meaningful use, according to Richard Gwinn, MD, director of urgent care at Sharp Rees-Stealy Medical Group in San Diego.

Rees-Stealy Medical Group has 19 locations,400 physicians,1,700 staff members and is one of the largest, most comprehensive medical groups in San Diego County. The group offers primary and specialty care, laboratory, physical therapy, radiology, pharmacy and urgent care.

 [See also: Do doctors have to be typists to get MU incentives?.]

Prior to implementing Nuance Healthcare’s Dragon Medical voice recognition software, providers dictated or hand wrote all documentation, according to Gwinn. Transcribing notes took two to three days and was very costly. Handwriting was faster, but illegible. The group implemented an EHR, but soon found that populating it was too much work.

Two years ago, Rees-Stealy group adopted Nuance’s Dragon Medical voice recognition software, and within ten months of implemention, the group went from recording 6,182 progress notes per month in AllScript’s Enterprise EHR to 19,020 notes, Gwinn says. Paper chart usage declined from 102,000 per month to 4,000 per month. The group lowered transcription costs by $800,000 to $900,000 annually, representing an 80 to 90 percent reduction.

 ”It took me less than one-half hour from the time I first opened Dragon Medical to the time I was using it,” Gwinn says.”It’s been a life changing application. I go home earlier. I don’t have stacks of charts on my desk and the swelling has gone down in my fingers (from typing).” 

With the advent of meaningful use, many physicians have recognized that while imperative, the task of manually entering data can be time consuming. The adoption of speech-recognition technology has enabled physicians at Dragon Medical to focus more on patient care instead of documentation, Gwinn reports.

Gwinn says the Nuance software has a 99 percent speech recognition rate. “It’s wonderful for me, because now I can create charts accurately  and concisely for patients and I can put them in the correct fields and I don’t have to touch the mouse, so I can do other things at the same time,” Gwinn says.

[See also: Medical transcription technology eliminates 30 jobs at Vermont hospital.]

Gwinn says Rees-Stealy is “among the most advanced groups in the country” when it comes to health IT and electronic health records. In addition, the group does “consistently very well on quality measures.”

Physicians were strongly encouraged to use the voice recognition software to populate the EHRs, and most have, but there have been a few holdouts, Gwinn says. 

As for Gwinn, he is 70 years old and wasn’t “in the least bit shy about adopting” the software. ”I’m very entusiastic about this,” he says.

via http://www.speechtechnologygroup.com/speech-blog - How voice recognition helps lower cost the healthcare industry..... SAN DIEGO – Voice recognition software has provided the means to lower transciption costs, speeding efficiency and populating data for achieving meaningful use , according to Richard Gwinn, MD, director of urgent care at Sharp Rees- ...

Monday, June 18, 2012

ICSI Research Launches New Speech Recognition Project

Research group sets out to improve speech recognition capabilities.....

The International Computer Science Institute (ICSI) announced a new research project focused on exploring automatic speech recognition (ASR) to understand the limitations and challenges from current technologies.

Sponsored by the Intelligence Advanced Research Projects Activity via the Air Force Research Lab, the research aims to use its conclusions to lead to new methods for improving ASR technology. The one-year project is expected to be completed by March 2013.

“This is a unique research project in that we are qualitatively and quantitatively exploring what is wrong with automatic speech recognition,” said Nelson Morgan, leader of the speech research activity at ICSI, in a statement. “From that we hope to gain insights into how we can improve ASR, potentially going forward in entirely new directions. When you don’t know specifically what is wrong with a technology, you are left with a hit-or-miss situation. This research should give us some clarity.”

The research project includes two major parts. The first is an in-depth look at the assumptions behind acoustic modeling, which is a key component of ASR that creates statistical representations of each of the distinctive sounds that make up words. This will enable ICSI researchers to discover technical challenges that prevent ASR from being more accurate.

The second part is a broad survey of experts and colleagues in the field, asking for perceptions on where ASR technology is effective, where it fails, and what its shortcomings are. This study will include interviews with practitioners and a review of recent literature to derive community consensus on what approaches don’t work, and to develop guidelines for future analysis.

Steven Wegmann is serving as coprincipal investigator of the research, overseeing the in-depth acoustic modeling phase. Coprincipal investigator Jordan Cohen is heading the breadth field survey phase. Morgan is the principal investigator for the full research project.

via http://www.speechtechnologygroup.com/speech-blog - Research group sets out to improve speech recognition capabilities..... The International Computer Science Institute (ICSI) announced a new research project focused on exploring automatic speech recognition (ASR) to understand the limitations and challenges from current technologies. Sponsored by th ...

Microsoft in bid to match ads to moods | The Australian

Feeling a little sad today? A Prozac ad is about to appear to help you out....

Microsoft would use its Kinect Xbox controller, which includes a camera and voice recognition, to register a person's mood based on behaviour or even facial expressions, to gather information.

The system could also assess your feelings based on the language you type into a search engine in a move that takes the targeting of advertising into new realms.

The existence of the patent application was revealed in the US last week and it raises questions about how far advertisers should be allowed to go in targeting customers.

Consumer advocacy groups have long raised concerns about the amount of data that is being collected and the revelation that Microsoft is seeking to tap into emotions may further inflame the issue.

via http://www.speechtechnologygroup.com/speech-blog - Feeling a little sad today? A Prozac ad is about to appear to help you out.... Microsoft would use its Kinect Xbox controller, which includes a camera and voice recognition, to register a person's mood based on behaviour or even facial expressions, to gather information. The system could also assess ...

Nexidia Unveils Interaction Analytics

Data and AudioMining across all channels for the contact center. The better way to stay on top of customer -contact center interactions.....

Nexidia has announced the release of Nexidia Interaction Analytics software for the contact center. Also released were Nexidia Advanced Interaction Analytics for the healthcare, communications, financial services, and technology fields.

Central to the Nexidia Interactions Analytics release are new multichannel searching capabilities across speech and text interactions, dynamic reporting on key metrics, and a new intuitive user interface.

“The big thing that’s new about this product is that it’s multichannel, meaning that instead of just processing phone calls, we’re also looking at other types of text interactions, such as chats, surveys, social media, and emails,” says Jon Ezrine, Nexidia senior vice president and COO. “We found that there are customers who are doing analysis that span across multiple touch points. Instead of being a speech analytics provider, we see ourselves as a customer interaction analysis vendor.”

Nexidia solutions allow contact centers to capture, synthesize, and disperse the business intelligence locked inside the different types of interactions with customers. Companies can now make sense of this unstructured data and deliver it to the organization in the form of dynamic metrics and dashboards, complete with drill-down access to actual customer interactions.

“In addition to analyzing our phone calls, we recently began to analyze other interaction channels with Nexidia, specifically Web chat,” said John Bowden, senior vice president, Enterprise Customer Care, at Time Warner Cable. “We are excited about Nexidia’s multichannel offering because we believe it will give us a more comprehensive view of our customer interactions.”

Key features of Nexidia Interaction Analytics include:

Multichannel Search: An extension of Nexidia’s analytics technology to enable rapid search and understanding of key phrases and topics across large bodies of speech, chat, survey, and email interactions. This provides a holistic view of customer interactions across the company for robust analytics and service optimization.

Query Builder: Offers the ability to save specific search criteria for future use. Now important search terms and phrases can be saved, easily reused when needed, and used to define the key performance indicators and metrics necessary to manage strategic improvements.

Dynamic Reports: New dynamic reporting using standard and user-definable metrics, such as customer sentiment and satisfaction, sales effectiveness, and profitability, are easily accessible. This puts the most important, up-to-date information easily within reach for daily management.

New User Interface: An updated user interface provides an intuitive and easy-to- use product for people in all roles, from senior managers to frontline supervisors.

Additionally, Nexidia is rolling out Advanced Interaction Analytics solutions, initially designed for the healthcare, communications, financial services, and technology sectors. The advanced solution offers:

Executive Dashboards: Additional dynamic dashboards and reporting on key performance indicators that specifically meet the needs of senior management teams. Executive Dashboards are customized to provide visibility of performance against main corporate objectives.

Deep Dive Analysis: Advanced Interaction Analytics provides additional capabilities for discovering trends and offering more in-depth root-cause analysis.

Managed Analytics Services: Nexidia offers optional Managed Analytics Services to help companies quickly get started with the analysis of customer interactions. The services team can either partner with an internal analytics team to provide additional support, or fully manage the interaction analytics program for companies that do not have internal resources readily available. The team can be engaged for a short term to get the customer up and running, or for continuous ongoing support.

“Today’s customer seeks service through all sorts of interaction channels, not just voice. That puts the onus on contact centers to delve through immense sources of information to understand what drives interactions,” notes Keith Dawson, Ovum’s principal analyst for customer interaction. “By adding multichannel analysis to the Interaction Analytics platform, Nexidia is helping usher in a new era in care: one where there are fewer silos and greater contextual awareness of what customers need.”

Nexidia Interaction Analytics solutions will be available in the third quarter.

via http://www.speechtechnologygroup.com/speech-blog - Data and AudioMining across all channels for the contact center. The better way to stay on top of customer -contact center interactions..... Nexidia has announced the release of Nexidia Interaction Analytics software for the contact center. Also released were Nexidia Advanced Interaction Analytics f ...

A little voice recognition please

Life with a speech recognition enabled TV set your living room........


A little voice recognition please

Samsung ES8000 TV

John DAVIDSON

Scene 1: A living room somewhere in Australia, June 2012. In strolls Dr John Davidson, a tall, fabulously handsome gadget reviewer from the Digital Life Laboratories, his suit jacket slung nonchalantly over his shoulder. (Note to producers: Can we please cast Jon Hamm to play me? Pretty please?)

Dr John: Hi TV! Power on!

Crickets chirp loudly in the distance. The TV remains resolutely powered off.

Dr John, louder this time: Hi TV! Power on!

The sound of the crickets gets louder too. Dr John positions himself in front of the TV and starts waving his hands in big circles; it’s like the wax on, wax off scene from The Karate Kid. Frankly, he looks like an idiot. The TV is nonplussed by it all, and stays off. The shot widens to reveal Christina Hendricks, Dr John’s wife. (Note to producers: Can we please get the actual Christina Hendricks to play Christina Hendricks? That would be awesome, thanks.)

Christina, putting down her martini: You know it’s a Sony, don’t you?

John: Oh darn it! It looks just like the Samsung I was reviewing in the Labs today.

Christina: Yeah, not so much. The Samsung has a camera on top and . . .

The sound goes all wonky and fades out, just as the shot goes wavy, like a TV that’s poorly tuned. This to indicate a flashback to the Labs, earlier that day. Some words crawl across the bottom of the shot: Based on actual events!

Scene 2: The Digital Life Labs. In strolls Dr John Davidson. He looks just as dashing in his fresh white lab coat. Maybe there are some lipstick stains on his collar? Just a thought.

Dr John: Hi TV! Power on!

The TV springs to life. For this is the that Dr John is reviewing, a voice- and gesture-controlled TV that is almost, but not quite, as big and handsome as our hero himself. (Note to producers: If we can’t get Hamm to play me, how about Viggo Mortshisname?)

Dr John: Hi TV. Search all.

A bubble with the words “Search all” miraculously appears in the bottom left corner of the Samsung’s screen. Moments later another bubble appears: “Please speak now”.

Dr John: The Voice Darren Percival!

TV: Connecting . . .

TV: Connecting . . .

The screen is eventually filled with search results for “The Voice Darren Festival”. There are links to three YouTube videos (including one for Darren Percival, but not the one our tireless hero was looking for), three random-looking Facebook entries, and a “web browser” link, offering to search for “The Voice Darren Festival”.

Dr John: Close, but no cigar!

TV: Is it noisy around you?

Dr John, slower and louder this time: Hi TV. Search The Voice Darren Percival.

Success! The screen is filled with search results for “The Voice Darren Percival”. Dr John positions himself in front of the TV, and holds out his right hand in front of him in a sort of “stop” gesture. Honestly, he looks a little effeminate, like he’s about to break out in song: Stop, in the name of love, before you break my heart!. Aware of this, he glances around the Labs to check no one else is around. Seeing he is alone, he starts waving his arms slowly at a tiny camera sitting atop the TV. Who does this guy think he is, the Pope or something?

Eventually, with his arms almost falling off from exhaustion as it took so much gesturing, our hero moves the Samsung’s cursor to the YouTube video he wanted – the clip from the TV show The Voice in which Darren Percival sings a duet with someone known only as “Brett”. Dr John clenches his fist like he’s in the Black Panthers or something, only of course he isn’t a Black Panther, for reasons that should be obvious to viewers. (Note to producers: unless of course we cast Don Cheadle as me, which would be awesome too.) The TV reacts to this odd gesture by firing up its YouTube app, and playing the video Dr John wanted. How strange.

Anyway, our hero sits down on the couch and watches and listens while the soulful sounds of Darren Percival wash over him. For Mr Percival is his favourite singer.

Too soon the video is finished, but our beloved Dr John is not sated. He does the fantastically stupid wax on, wax off gesture, which causes the Samsung to go back to the search results page. With much flapping of his arms, he cajoles the cursor to the bottom of the screen, clenches his fist in that Black Panther salute again, and effects a web browser search for “The Voice Darren Percival”.

The browser on the Samsung eventually pops up – none of this stuff is lightning quick, despite the ES8000’s $3500 to $4300 price tag – with Bing search results. Bing! Are they nuts? Yet, by some fluke, the top result happens to be the one our amiable pacifist is after.

His arms too tired from all that flapping, he picks up the Samsung’s new touchpad remote control, scrolls down to the result he wants, and clicks the remote. Up comes the website for the TV show The Voice, right there on his TV.

A little more hunting around, and Dr John tells the website to play another Darren Percival video.

Web browser on TV: ninemsn video . . .

TV: ninemsn video . . .

TV: ninemsn video . . .

This goes on for some considerable time. Our hero is nodding off. He is, after all, married to Christina Hendricks. Need I say more?

TV: Error 2046

Dr John, nonplussed Sony style: Error 2046? Are you @#$#$ kidding me?

TV bubble: Connecting . . . Search for Are you f—-ing kidding me in web browser?

Dr John: Perfect. Hi TV. TV Power off.

TV: TV power off. OK?

Dr John: Whatever.

Our scientist takes off his lab coat and heads home. He has been at work all of 15 minutes.

But who, after all, would blame him?

via http://www.speechtechnologygroup.com/speech-blog - Life with a speech recognition enabled TV set your living room........ Samsung ES8000 TV John DAVIDSON Scene 1: A living room somewhere in Australia, June 2012. In strolls Dr John Davidson, a tall, fabulously handsome gadget reviewer from the Digital Life Laboratories, his suit jacket slung nonchala ...

Robin for Android: a new speech recognition app aimed at drivers

Will men ask the new android speech recognition assistant Robin for directions?

A lot has been said about Apple’s Siri, including some interesting remarks by Apple’s co-founder Steve Wozniak. There are those who love it and those who, how can we put it, those who aren’t so keen. But everyone agrees that voice control and “talking” to our mobile devices is the future.

For Android users, there are a number of different apps available and the new kid on the block is Robin. Designed for drivers, the main functionality of Robin revolves around navigation, traffic, parking and gas (petrol) prices. But its makers, Magnifis, have also added in some knowledge about Twitter, reminders, and even the occasional joke.

Robin is currently in Beta test across the US and can perform local searches (with the help of Yelp) as well as provide  real-time traffic and parking information, weather updates, and so on.  Using KITT from the classic 80′s TV show Knight Rider (and the not so classic 2008 show) as a model, Robin will be developed into a full personal assistant.

Since it is aimed at drivers, Robin doesn’t use search engines like Google or Wolfram Alpha to find out the answers to complicated questions like “What are the five biggest lakes in California?” But you can ask it a whole load of driving related stuff like:

  • Find a burger place … Does it have a good rating?
  • Where can I park?
  • Go to 1234 Lombard San Francisco
  • How is the traffic?
  • Is it snowing in Moscow?

On the lighter side you can ask it for advice by saying something like “I need some words of wisdom” or for a smile try “tell me a joke.”

I had a quick chat with Robin and recorded the results in the video below.

Need some words of Wisdom? Here are mine: Download Robin and give it a go and then let us know what you think by leaving a comment below.


via http://www.speechtechnologygroup.com/speech-blog - Will men ask the new android speech recognition assistant Robin for directions? A lot has been said about Apple’s Siri, including some interesting remarks by Apple’s co-founder Steve Wozniak . There are those who love it and those who, how can we put it, those who aren’t so keen. But everyone agrees ...

Sunday, June 17, 2012

Text-to-speech technology for Malta

Interesting tale how text to speech models can be built even for the smallest user base 

Access to information is crucial in today’s society. What use is there for information that cannot be accessed? How can a visually impaired or illiterate person use information that is displayed on a screen if that information cannot be read?

These questions, or rather, the answers to such questions, were the inspiration for a project that led to the creation of the first Maltese Text to Speech Synthesiser.

Whereas in the recent past, information was accessed through personal computers, people are becoming more familiar with different forms of technology, such as tablet PCs and smart phones, as well as with alternative means of interacting with computers. Popular examples are iPhone’s Siri and Nokia’s Audiobook Reader, which promote speech recognition and synthesis respectively.

These technologies make it easier for almost everyone to use computers. However, for many, they define whether or not one is able to use a computer. For example, textual information shown on computer screens is not accessible to illiterate or visually impaired people. Nevertheless, it becomes accessible with the use of speech synthesis technology, which reads out what is shown on the screen.

In 2008, the Foundation for Information Technology Accessibility (FITA) obtained funding via the European Regional Development Fund (ERDF), with 85 per cent of the cost shouldered by the EU and 15 per cent by the government, to work on the ERDF114 Maltese Speech Engine (MSE) project.

The project includes three main components, namely:

Research study.

Software development.

Publicity.

Research Study

The research study, carried out by Fsadni and Associates in 2009, focused on identifying the potential need in Malta for this technology. It revealed that there is a significant interest in the MSE, with 44 per cent exhibiting a need for speech-enabled software that speaks in Maltese.

Software Development

The Malta Information Technology Agency (MITA) assisted FITA in drawing up a public call for tenders vis-à-vis the software development component of the project. The contract was awarded to Crimsonwing plc.

Work on the software, which consists of a lexicon (dictionary), speech synthesis module and two host applications, commenced in January 2010. The lexicon supports some of the functionality of the speech synthesis module and is also intended to provide a basis for future research, including speech recognition. The host applications are intended to display the functionality of the MSE.

Apart from the extensive involvement of end users in the design and testing process, FITA also required that three prototypes be produced prior to the final version. The testing involved the Education Directorate and a number of disability-sector NGOs, apart from entities either contacted by FITA or that expressed an interest in the project. The prototypes enabled FITA to monitor the progress of the MSE and provide end-users with the opportunity to relay their feedback back via FITA. Testing involved processing a considerable amount of data, recording the outcome of different user operations and using different speech-enabled products across different MS Windows platforms. All of this had to be done for multiple users. This development cycle has been ongoing for the last three years, resulting in the MSE software product that is due for release in just over two weeks’ time, on 26 June. The testing process enabled FITA to produce supporting documentation for the MSE, with instructions for integrating it across different software products.

Publicity

FITA has distributed and is still producing research documents and publicity material related to the MSE, including a new website and distributable DVDs. Following the launch on 26 June, the project will draw to a close with an information seminar in September. The software will be made available for free download on FITA’s website at www.fitamalta.eu or can be obtained on a DVD, for a fee, from FITA’s offices at Gattard House, National Road, Blata l-Bajda.

Can I use the MSE?

I am frequently asked who can use the speech engine, once it is available. Until recently, speech-enabled software supported many languages, but not Maltese. Many users of assistive technology were forced to use the English or American voice as a default language and the pronunciation of the Maltese text sounded very bad indeed. The MSE is compatible with the Speech Application Programming Interface (SAPI V5). It therefore supports any speech-enabled software that supports this standard. Examples include the Window Eyes and NVDA screen readers, the Grid 2, Clicker 5 and other communication and educational software.

The list of potential end users includes people with dyslexia, illiterate people, visually-impaired people, those with intellectual impairment, children in pre-primary and primary education and anyone who may tire reading a newspaper or book and prefers having it read to him.

There are also many commercial applications that could use the annunciation of process and commands in Maltese. Some of the most common applications are e-services and the automated handling of phone calls, where one may no longer need to rely on recorded voice snippets, but can more easily update and modify voice messages. Maltese speech synthesis can also enable industry to customise equipment to Maltese-speaking machine operators.

E-services that rely on the Maltese language can be more user-friendly to Maltese speakers by using the MSE in order to assist computer access. Fuller inclusion within the information society will benefit Maltese society as a whole by empowering individuals to gain better access to education and obtain gainful employment. This is another step forward wherein, by minimising the digital divide, FITA enables individuals to contribute productively to society and the economy.

For more information please contact ERDF114 project manager Roger Davies-Barrett on tel. 2599 2178 or by email to roger.davies-barrett@gov.mt

What is FITA?

The Foundation for Information Technology Accessibility is the principal advocate and coordinator for making information communications technology (ICT) accessible for people with disability in the Maltese Islands. It was established by the Malta Information Technology Agency (MITA) and the National Commission for Disabled People (KNPD) on 2 October 2001 with the aim of facilitating the integration of people with a disability who find themselves at a disadvantage in a particular environment, by providing equitable and appropriate enabling accommodation.

The Foundation’s

aims and objectives

• To promote equal opportunities for everyone, in particular in relation to information technology matters;

• To provide training services in information technology to disabled people;

• To gather and disseminate information and to increase awareness on information technology matters

• To liaise with and facilitate public and private endeavours regarding the creation of equal opportunities in respect of information technology

• To offer advice and consultancy services to private and public organisations in information technology and its use by people with a disability.

FITA’s principle function is to provide support to disabled individuals in overcoming or removing barriers to education and employment through ICT. Through empowerment and social inclusion, people with a disability will have less reliance on family and state support. FITA’s information services help disabled people in the selection, acquisition and use of an assistive technology device that is intended to increase, maintain or improve the individual’s quality of life. By ensuring that proper steps are taken to minimise the digital divide, individuals are able to contribute productively to society and the economy.

Roger Davies-Barrett is Project Manager ERDF114 Maltese Speech Engine (MSE) project

via http://www.speechtechnologygroup.com/speech-blog - Interesting tale how text to speech models can be built even for the smallest user base   Access to information is crucial in today’s society. What use is there for information that cannot be accessed? How can a visually impaired or illiterate person use information that is displayed on a screen if ...

Saturday, June 16, 2012

Google Already Falls Behind Apple In Local Business Listings

Who would have thought that Apple will overtake Google in the number of business listings?

Apple, as I’m sure you’ve heard, made a number of major announcements this week at its Worldwide Developers Conference. Some of them were search-related. Apple’s browser, Safari, for example, is getting search functionality similar to Google’s Chrome, as well as Baidu as a search option. Apple’s Siri is doing more in the way of retrieving answers related to sports, movie and restaurant queries.

The biggest search-related news to come out of the event, however, was that Apple dumped Google for its maps offering. That could be a big blow to Google-based local searches. Reports are now emerging that Apple will launch its new Maps project (due out with iOS 6 this fall) with even more business listings than Google has.

According to a Bloomberg report, Apple already has about 20 million more business listings than Google, at 100 million to Google’s 80 million. The report also quotes Google’s Brian McClendon as saying that Google Maps has over a billion active users. I wonder how Apple’s move will impact that numer. Apple said at its event that it had sold 365 million iOS devices as of March.

Apple is now using TomTom as its primary Maps data provider, and Yelp integration has been highly publicized, but these aren’t the only data sourced Apple is using. Greg Sterling points to an Apple copyright page (h/t: Matt McGee) that shows some other providers, which include: Acxiom, CoreLogic, DigitalGlobe, DMTI, Getchee, Intermap, LeadDog, Localeze, MapData Sciences Pty Ltd., MDA Information Systems, Urban Mapping and Waze.

Of course Google is doing its own thing in the local search space. In addition to making its own maps improvements, it’s tying business listings to Google+ pages, and giving businesses some more social ways of engaging with customers.

Apparently there’s a lot of money in local search these days. Yext also just secured a new $27 million round of funding to expand its business listings platform.

via http://www.speechtechnologygroup.com/speech-blog - Who would have thought that Apple will overtake Google in the number of business listings? Apple, as I’m sure you’ve heard, made a number of major announcements this week at its Worldwide Developers Conference. Some of them were search-related . Apple’s browser, Safari, for example, is getting searc ...

Siri’s A Big Girl Now

Speech recognition and text-to-speech technology is growing up. And Siri is the best example what it can do today.....

Monday was more than a monumental keynote at Apple’s developers conference: It was also a coming out party for Siri.

It had been rumored, of course, that Siri would play a large part in Apple’s newest offerings. It was obvious from the beginning Apple was rather proud of their Intelligent Assistant baby. As soon as they saw her pop up in the App Store all those years ago (ok, 2 years ago) they knew they’d be able to give her a good home in Cupertino, California. Apple saw potential in her straight away and had to have her. She went through her awkward adolescent phase with as much grace as anyone else does in their Jr. High days, but now she’s ready to emerge fully blossomed, much smarter and ready to take on the world. And, as often happens in child-rearing, Apple has learned as much from Siri as Siri has learned from Apple. It’s a beautiful tale, and it was all unfurled for us on Monday as Scott Forstall explained all the ways Siri was ready to help.

Apple wasted no time showing off their baby girl, either, giving her opening credit duties.

(Before I move on, I must admit that, yes, I know Siri is only a piece of AI which is voiced by a woman in certain parts of the World. For the sake of the analogy, we’ll pretend as if the American version of Siri speaks for them all.)
Siri cracked some jokes at Google and Samsung’s expense, saying of Android’s sweet-toothed naming convention, “Who comes up with these names, Ben & Jerry?”

She then made the joke, “Honestly, I’m excited for the new Samsung…refrigerator! Hubba Hubba!”

Things haven’t always been so cheery in the Apple household…

The “teenage” years

Currently, when asked a question, Siri’s insistence on searching the web is the equivalent of a teenage girl responding with a shoulder shrug and a “dunno” when asked how her day went. Siri seemed willing to help, but only so much. When she does understand you, (she hardly understands me…and I was once a radio announcer in another life) she only does a few things well. Then, of course, there are the times when she is just completely unreachable, doing whatever it is adolescent girls do in their spare time.

She can send emails and texts, but will only read them aloud if you let her display them on the screen first. She will remind you to pick up milk the next time you’re at the store, but you also have to manually tell her where the store is…she isn’t yet wise enough to come to the conclusion of “Hey, so like I know you’re at a store, I don’t know if, like, this is the store you normally go to but, like you told me to remind you so, you know, I’m doing it.”

After all, if I’m told to get milk, I’ll take all the reminders I can get. Siri going the extra mile shouldn’t be seen as a nuisance. As it is, she’s only marginally helpful.

Thus, Apple’s insistence on referring to her as Beta.

It’s the most graceful way for them to shrug their shoulders and sigh in a way only parents can as if to say, “Kids…what are you gonna do?”

Siri and Apple have had their differences in the past. First, Siri angered some when she wouldn’t tell us where we could find an abortion. Making matters worse, she had no problem telling us where to find escorts or Viagra…

Then, Siri was found parroting a particularly naughty 4-letter word in Britain.

Finally, Siri became so ornery she began telling people in bars that another phone was better than Apple’s iPhone.

Movies, Sports and Traffic

But those days are in the past now. Apple’s finally had some words with their little girl, and it seems like some have stuck. She’s gone off to learn, to mature  and, as many do, discover herself.

Scott Forstall had the honors of announcing a new Siri to the world. Just as a proud father would, Scott started off by extolling the great things Siri can already do before launching into all the new tricks she’ll soon be able to do. In fact, Forstall sounded like a father who had been telling his buddies for months about how great his daughter has been doing in college. Now, she’s returned home, and Forstall wants to prove that he wasn’t making any undue exaggerations.

And just like a father would, Forstall couldn’t help himself, asking something about sports to begin with. Of course, boasting about Siri’s knowledge of sports was a bit of an odd choice. While no doubt useful, (I know I’ll use it everyday) the crowd full of typically sun-shy software developers couldn’t muster up much excitement. Still, I’ve tried before to ask Siri the score of last night’s Ranger game and was instead told she could perform a web search for “Siri, did the Texas Rangers win last night?”

Her new abilities to pop up the score in a beautifully designed score card—complete with box score—show her maturity. It looks as if Siri will be getting this sports information from Yahoo!, another slight diss to Google. It’s also likely Forstall’s team wanted to teach Siri all about sports to make her more appealing to Joe and Jane Q.

Everyman. After all, as the iPhone reaches more hands, Apple will need to make sure the mainstream crowd is appeased. What’s more mainstream and homegrown than being able to ask about your local sports team?
Not only is Siri a bit of a flirt, learning all about your favorite team, she’s also a romantic, willing to go out for dinner and a movie. This is a trick she was once able to do before Apple brought her into their family. Sure, you can ask her right now if there are any good Pho places nearby. She’ll even sort them by rating. In iOS 6, she’ll not only list the restaurants by Yelp rating, she’ll also display what kind of cuisine is sold at a particular restaurant and how much cash you’ll need to bring with you. Apple has partnered with Yelp on this one to deliver all of this information right within Siri’s window. Tapping on a restaurant will bring up all sorts of extra information about the eatery, as well as display some Yelp reviews.

Do you see a new restaurant you’d like to try? Siri will also call ahead and reserve a table for you. Apple has also partnered with OpenTable, (yet another skill she once had before Apple took her in) so tapping on the “book reservation” button inside Siri will open up the OpenTable app and book the table.

In the mood to see a Morgan Freeman flick? Just ask Siri what movies he currently stars in (he’s in everything…) and she’ll display them for you promptly. Apple has once again partnered with Rotten Tomatoes to display an average rating for movies. They’ve also integrated their own Apple Trailers, so viewing a preview of the movie is as easy as tapping a button, all right there within Siri.

This new, mature and robust Intelligent Assistant can also guide you as you make your way to the restaurant and theater.

As a part of Apple’s new Maps and traffic services, Siri also takes over on turn-by-turn duty, guiding you to your destination. The much discussed Maps app also gathers traffic information from every other iDevice on the road, an anonymously beautiful solution to maps and traffic. As such, if Siri notices an accident or other roadblock on your route, she’ll offer you a new way. Should you be running on fumes on your way to dinner, you can also ask Siri where the closest gas station on your route is. She knows, and she’ll take care of it. Apple has also been bragging on Siri to some of the World’s largest car manufacturers—such as Audi, BMW, Chrysler, Honda and Mercedes, to name a few—and will begin to roll out what they’re calling “eyes-free,” a mode which will let you call up Siri at the push of a steering-wheel button, rather than actually reaching down for the phone and holding the home button for a while. Of course, should you get restless, you can ask Siri “Are we there yet?”

She’ll calmly tell you how much time is left on your journey.

Siri on the iPad, comfortable staying at home

With all these improvements to their Intelligent Assistant, Apple saw it fit to give her a bit of a raise, allowing her to take control of new iPads as well as the newer models of iPhone. Now, those who want to take an evening in can ask Siri to post to their Facebook or Twitter (yes, now Facebook is as deeply integrated as Twitter in iOS) as well as launch a rousing game of Temple Run. She may or may not be able to take a bathroom mirror picture of you in your gym shorts….Forstall didn’t mention it, understandably so.

After all, how vain is it to keep asking your assistant to take a picture of you?

She is also coming to many other parts of the world, such as Canada, Italy and Spain and has learned several new languages as well.

What have we learned here?

And just what did Apple learn from this entire experience? They certainly swung for the fences with Siri, placing her at the forefront of the ads for their latest iPhone offering. They asked every iPhone 4S user to put her to work, using her to plan their day, move around their calendar and remind them to do all those little tasks which often slip our minds. Perhaps they thought a little too big, however, as Siri was unavailable in the first weekend we got to play with her. I have to think, however, that Apple has been listening in on our conversations with Siri, learning from the things we asked her. For instance, did Apple decide to make a move into sports because so many of us are already asking her what last night’s score was? Have we been wanting to know who is playing in the latest summer blockbuster?

I know I have, on more than one occasion, asked where the closest gas station is. I’ve also asked where the closest taco stand is…a taco finder is noticeably absent from Siri’s list of features, but I’ll let it slide. In all this learning, Apple has been able to cater Siri to our needs and teach her all the things we want to use her for. Indeed Siri proved to be a little more intelligent than we thought, and hopefully Apple is as well. Calling Siri “beta” for so long was more of an admission of guilt than an admission of pride, and rolling her out as gradually as they have still proved to be a little more than their servers could handle. Now that she’s been out in the open all these months, hopefully they now know what it’s going to take to have a successful launch when they bring her out once more.

She’ll be here in the fall

Yes, Siri is finally able and ready to not only take on more tasks, but do them with a bit more grace and panache. She’s working well with others, such as OpenTable and other native apps, and she’s more willing than ever to listen and understand what you are asking of her. Hopefully by the time she arrives on our sparkly new iPhones and hopefully still-sparkly new iPads, she’ll be just a little embarrassed of her transgressions as a youth, but more importantly, ready to prove her worth to us once again. Yes, Siri is ready to move out of the house, stop hanging around the bars telling dirty jokes, and ready to get some real work done. We only have till this fall to find out. I think we can give her one last summer to get the last of her indiscretions out of her system.

via http://www.speechtechnologygroup.com/speech-blog - Speech recognition and text-to-speech technology is growing up. And Siri is the best example what it can do today..... Monday was more than a monumental keynote at Apple’s developers conference: It was also a coming out party for Siri. It had been rumored, of course, that Siri would play a large par ...

MModal Brings Speech Recognition To Clinical Decision Support - Healthcare

Healthcare is one of the industries where speech recognition and text-to-speech technology has made a lot of progress

5 Key Elements For Clinical Decision Support Systems
5 Key Elements For Clinical Decision Support Systems
(click image for larger view and for slideshow)
The launch of a speech recognition-based clinical decision support platform in the cloud by Franklin, Tenn.-based MModal is the latest step in the growth of systems that pull actionable information from unstructured electronic medical data. But some researchers believe current technologies are flawed and are kicking off an effort to pinpoint and then improve some of the shortcomings.

MModal, formerly known as MedQuist, this week introduced the first two applications in its new MModal Catalyst suite of products. One, called MModal Catalyst for Quality, puts data into context so provider organizations can improve documentation and coding, as well as meet requirements for Meaningful Use of electronic health records (EHRs). The other, MModal Catalyst for Radiology, structures information from radiology reports.

“So much of the data today that’s valuable is locked up in unstructured data,” Mike Raymer, senior VP of solutions management at MModal, told InformationWeek Healthcare. “We take every clinical observation and encode it with SNOMED,” he said. Similarly, prescription data gets encoded according to the RxNorm ontology and laboratory reports are matched to the Logical Observation Identifiers Names and Codes (LOINC) system.

This form of natural language processing—what MModal calls “natural language understanding”—helps with context to produce more accurate coding and documentation without having to perform full chart audits, according to Raymer. For example, in looking for whether a hospital administered aspirin to someone complaining of chest pains, the technology can search the patient’s chart to identify mentions of chest pain.

[ Practice management software keeps the medical office running smoothly. For a closer look at KLAS’ top-ranked systems, see 10 Top Medical Practice Management Software Systems. ]

“These will be tools used by providers as payers impose value-based reimbursement,” Raymer explained.

In the next three to four years, MModal expects to have 35-45 applications as part of the Catalyst suite, including modules specific to nursing documentation, long-term care, and home care, and for various medical specialties. “Our learning engine could be applied to readmissions management,” Raymer said.

Catalyst builds upon MModal Fluency, a service introduced last month that adds cloud-based speech capture to EHRs. Together, the MModal offerings are similar to what IBM and Nuance Communications, through their partnership with the University of Pittsburgh Medical Center, are doing with similar technology called clinical language understanding. “It’s immediate feedback,” Raymer said.

But is speech recognition accurate enough for precision applications such as healthcare?

The International Computer Science Institute (ISCI), a research lab at the University of California at Berkeley, disclosed this week that it is in the midst of a yearlong study of the limitations and challenges of current automatic speech recognition technologies.

“This is a unique research project in that we are qualitatively and quantitatively exploring what is wrong with automatic speech recognition. From that we hope to gain insights into how we can improve ASR, potentially going forward in entirely new directions,” ICSI deputy director and project leader Nelson Morgan, said in a statement.

“When you don’t know specifically what is wrong with a technology, you are left with a hit-or-miss situation. This research should give us some clarity,” Morgan explained.

The research project, set to run through March 2013, will examine the scientific assumptions behind acoustic modeling to help identify potential technical challenges. It also will survey experts in the field of speech recognition to gauge their opinions about what does and does not work.

Get the new, all-digital Healthcare CIO 25 issue of InformationWeek Healthcare. It’s our second annual honor roll of the health IT leaders driving healthcare’s transformation. (Free registration required.)


via http://www.speechtechnologygroup.com/speech-blog - Healthcare is one of the industries where speech recognition and text-to-speech technology has made a lot of progress 5 Key Elements For Clinical Decision Support Systems (click image for larger view and for slideshow) The launch of a speech recognition-based clinical decision support platform in th ...

Text-to-speech - Nokia

Voice synthesis technology for Nokia devices.....

Let your phone read your messages out loud with Text-to-speech, a tool which allows you to listen to text messages, multimedia messages, and emails.

Text-to-speech is:

  • a convenient and hands-free way to access your messages

  • offered in a range of different languages and voices*.

Text-to-speech uses the Message Reader application built into compatible Series 60 mobile phones. The service is available only for selected phones where the application is preinstalled. To use Text-to-speech, you will need to install one language package and a corresponding voice package.

.

Downloading and Installing

  1. Check in which of the lists below has your phone listed.

  2. Proceed to the language selection page

  3. Select the desired language package and download it to your PC.

  4. Select the desired voice package and download it to your PC.

  5. Transfer the language and voice packages to your phone using Nokia PC Suite.

 

For more information about installing applications on your phone, please see your phone’s user guide.

* The number of voices available depends on the language. Each language has at least one available voice.

via http://www.speechtechnologygroup.com/speech-blog - Voice synthesis technology for Nokia devices..... Let your phone read your messages out loud with Text-to-speech, a tool which allows you to listen to text messages, multimedia messages, and emails. Text-to-speech is: a convenient and hands-free way to access your messages offered in a range of diff ...

YouTube Adds Automatic Captions in Spanish

iPhone 4S - What’s New In iOS 6?

Quick peek at some of the new features that iOS6 will offer......

 

It is that time of the year, again. It is time for an update to iPhone’s Operating System. The iOS 6 came as a big surprise and people are talking about it. So what’s new in iOS 6? Read on!

With the iPhone 4S’ virtual assistant named Siri, people were given a new level of smartphone experience. Its ability to answer odd question made us love it. We all find Apple’s Siri to be quite helpful and genius at the same time. It can also send text messages on our behalf. These are just some of the cool things that you can expect from Siri.

With the new iOS 6, Siri has been given features boost. It now sports sussed. It can also list game times and best of all scores with player standings. There is also the integrated “Yelp reviews” for your favorite restaurants and it can now tell you average prices. Another big plus is that you can now use Yelp for your reservations.

Siri also has movie knowledge. It can now give you movie listings and trailers. You will also get details on different movie stars. This is the voice-controlled IMDb version only for your iPhone. Siri can now launch apps, you will get no listing just straight actions. If you are driving a BMW, then you will truly enjoy Hands-free technology. You do not even need to touch your handset to activate Siri. The steering wheel can now do it for you.

If you are a new iPad owner, then rejoice! This is because Siri is now finally available on the new iPad. If this device had voice dictation before, you will now enjoy the powerful Siri on this device. Of course there are regional restrictions that you have to consider on any of these new Siri features on the iOS 6, so let’s just all wait and see what other fun stuff it can give us.

 

via http://www.speechtechnologygroup.com/speech-blog - Quick peek at some of the new features that iOS6 will offer......   It is that time of the year, again. It is time for an update to iPhone’s Operating System. The iOS 6 came as a big surprise and people are talking about it. So what’s new in iOS 6? Read on! With the iPhone 4S’ virtual assistant name ...

Friday, June 15, 2012

Hacking and defeating Google's reCAPTCHA with a 99% accuracy

Speech recognition and captcha solving......





One exciting thing about this: the entire model of reCaptcha (at least the text ones; I assume the audio ones are similar) is to make people do useful work when solving captchas by having them complete tasks that they consider too hard for computers to do well (in the text reCaptcha case, OCR). If someone writes software that can defeat the captcha, it does mean the security model is broken, but it also means the state of OCR technology (or audio recognition or whatever) has been advanced, and the digitization of books that had previously required human intervention can now be accomplished by automated means. In other words, spammers are incidentally creating the tools to expand the scope of digital human knowledge. Win-win, really.

reply

fuelfive 1 day ago | link

Unfortunately, this attack does nothing to advance the state of the art in OCR (or audio recognition). It's basically the same story as every other CAPTCHA attack to date: take advantage of some accidental statistical regularity in the generation function. As soon as this kind of flaw is discovered, it only takes a few hours for the generation code to be patched in such a way that completely prevents this sort of attack from working.

reply

fghh45sdfhr3 1 day ago | link

So either the code is easy to patch, or we DO advance. Win/Win?

reply

xibernetik 23 hours ago | link

Not really... Even if the code is difficult to patch, speech/audio recognition doesn't advance much when an attacker figures out how to remove the (non-random) noise added by a machine over the sound file. Actual speech recognition relies on the ability to filter out background noise - which is a lot more complex/random - added by surroundings, not a machine.

It's very difficult to generate some sort of noise via algorithm that a) humans can filter out and b) can't be removed by some algorithm. As a result, audio captchas are a huge vulnerability and the weakest link in almost any captcha system, although you can't get rid of them by law.

Hypotheticals aside, the code was easy to patch - note the footnote: > In the hours before our presentation/release, Google pushed a new version of reCAPTCHA which fully nerfs our attack.

reply

d2vid 19 hours ago | link

Could one take real recorded noise and add that rather than noise generated via algorithm? Wouldn't that force attackers to solve a real problem (removing background noise from an speech sample)?

reply

xibernetik 12 hours ago | link

It's not really solving the "real" problem... If I'm just mashing two audio files together, that's going to be different than someone talking in the middle of a train platform and there will likely be algorithmicly-determinable difference from the artificially generated words and the naturally generated noise.

All of this aside, removing background noise is not a huge issue anymore. We have pretty decent noise-cancellation technology. Speech recognition - the other big component - has advanced a lot in recent times and is actually pretty good, although not for every company/product.

Even if it would be helpful, you'd have to record an incredible amount of noise in the first place, seeing as you're getting millions of hits a day and if you have a small sample set, the attackers will just figure out the solutions to that sample set and be done.

I'm not saying it's impossible, but I am saying it's probably not worth it at this point. Captchas (in their traditional forms) don't make sense as a long-term strategy anyways.

reply

robryan 14 hours ago | link

Yeah, you would think they could record thousands of hours of real world noise then randomly use sections of it on each audio captcha.

reply

A1kmm 5 hours ago | link

If the attacker manages to obtain all the random noise, they could index every window in the noise in a k-d-tree and perform an efficient nearest neighbour search for the exact background from the CAPTCHA audio, and then simply subtract the background, giving perfect segmentation in O(log(N)) asymptotic average time complexity for N windows (at 64kHz and 2000 hours of audio, N=460800000, log N = 19.95).

reply

jopt 23 hours ago | link

That kind of Win-Win is also a Lose-Lose. It's just a glass-half-empty thing.

reply

TeMPOraL 20 hours ago | link

> If someone writes software that can defeat the captcha, it does mean the security model is broken, but it also means the state of OCR technology (or audio recognition or whatever) has been advanced, and the digitization of books that had previously required human intervention can now be accomplished by automated means.

No it doesn't. reCaptcha only checks one of two words it displays (the other one being what OCRs can't handle themselves), so naturally you only need to crack one and input garbage as the other, thus actually making the world a worse place.

reply

apendleton 20 hours ago | link

Probably not substantially worse. You would need to be able to tell which was which, since they're not consistently ordered and the "good" one is deliberately obfuscated/smudged/whatever, and, since recaptcha depends on multiple users agreeing on the right answer, you and other attackers would need to be consistent in your garbage in order for it to make it into the canonical book transcript.

reply

anonymoushn 20 hours ago | link

For ReCAPTCHA, you only have to type the known word almost-correctly. You can just type anything for the word they don't already know. Even if someone had tech that could pass visual ReCAPTCHA reliably, it wouldn't need to be capable of doing useful work.

reply

aptwebapps 1 day ago | link

Are the audio versions sourced from recordings that need transcription as the visual versions are sourced from scanned documents? I assumed that was just to improve access.

If not, then it isn't useful work, really.

reply

vhf 1 day ago | link

Looking at how the hacked it, we can safely deduce the audio version is only to improve access.

Useless for transcription, because it 'validates' phonetically. This is part of the attack : Whether audio is "wagon" or "van", the 'word' "wagn" validates. Same for "Spoon" and "Teaspoon", which both can be validate by entering "Tspoon". (Since Ts can be said "Ss" as in "Tsunami" (Sunami) and "T S" as in T-Rex.)

reply

apendleton 20 hours ago | link

That's a shame. I would imagine that a productively similar audio task could be devised around humans helping to transcribe hard-to-auto-transcribe audio recordings, and that it could use a similar strategy to the visual one: a known-good clip and an ambiguous one, requiring the user to transcribe both.

reply

vhf 16 hours ago | link

Well, I'm not a linguist or anything like that, but I'm not sure it would work so well, if it was useful at all.

1/ Homophony. Knowing how to write a word you hear often needs contextualisation. Giving two whole sentence for the user to hear is too long and takes too much time for the user. 'Right, but no need for a whole sentence, a few words suffice' Right, but if the computer knows where to cut these sentences, it can also transcribe it itself. 2/ You have to assume good spelling from user. 3/ Is it useful ? I mean, 100M recaptcha are done everyday, what part of these 100M are audio recaptcha ? 0.0005% ? Less ? Transcribing 2 sentences every week makes no sense. Keep in mind that homophony+bad spelling are two factors increasing hugely the number of times the same 'unknown clip' will have to go through 'human validation' until we can assume with a certain level of confidence that it has been transcribed.

Funny 4/, take a look at the [cc] button on some youtube videos : on the fly transcription. Thanks Google. :) Oh, and also on the fly translation of on the fly transcription, btw. Google even said they were working on Voice to voice translation for Google Voice : English speaker calls Chinese speaker, english voice transcribed, then translated, then synthesized, same the other way. :)

reply

throwaway1979 1 day ago | link

Google's captcha system is horrid. I've mentioned this to people on the accessibility team but to no avail. They used to have a wheel chair icon next to the bloody scrambled text. I taught a computer class to seniors and it was painful watching them deal with the account sign up process (also, I thought it was insulting asking a mobile senior to click on the wheel chair icon ... to the designer ... FU!). Clicking on the wheel chair would give audio that barely made any sense to me. The whole process was stupid.

Like many others, I can barely get through their captcha service. I'm actually happy people circumvented it. Maybe someone will think it through this time around.

reply

aidenn0 23 hours ago | link

http://en.wikipedia.org/wiki/International_Symbol_of_Access

reply

TazeTSchnitzel 1 day ago | link

OK. Show me a CAPTCHA that is easy for humans to read, and very difficult for computers to read.

reply

rytis 5 hours ago | link

It shouldn't be about being able to "read". Ideally it should be something that only human could "solve". May be stupid example, but I can't come up with a better one (and if I could, I would be a lot wealthier by now ;) ): show a picture where a forest in landscape is violet and ask a user to identify what's odd on that picture. Another example, show a face with two noses. Ask them to identify the odd item. Things like that. Easy for human, impossible for computers. The problem is that these tasks need to be generated by humans, as I cannot think of any (irreversable) way to do this automatically. May be mechanical turk to the rescue?

reply

saraid216 21 hours ago | link

> Like many others, I can barely get through their captcha service. I'm actually happy people circumvented it. Maybe someone will think it through this time around.

Anytime you're ready, we're listening.

reply

entropy_ 1 day ago | link

Well, chances are it'll get even harder now, not easier since they'll need to add further complexity to differentiate humans from bots.

reply

tmh88j 1 day ago | link

I also thought that was insulting with the wheel chair icon. A person in a wheel chair has problems walking, not (necessarily) their vision. How about "Help" in text?...less confusion and possible anger

reply

ceejayoz 20 hours ago | link

Apparently the wheel chair is actually an ISO standard.

http://en.wikipedia.org/wiki/International_Symbol_of_Access

reply

DanBC 18 hours ago | link

But it's a symbol for mobility - not for all disabilities.

There are symbols for visual impairment, but they're not international.

(http://commons.wikimedia.org/wiki/File:Pictograms-nps-access...)

(http://3.bp.blogspot.com/-HfEx4Y_O_Gs/Tf0huVPZBXI/AAAAAAAAAC...)

reply

excuse-me 1 day ago | link

Since the point of the audio version is not to be hit with lawsuits under the ADA - perhaps it should just be a little icon of a lawyer?

reply

PassTheAmmo 23 hours ago | link

I imaging that whole sites could be designed using only your proposed lawyer icon, possibly with some additional icon representing political correctness.

reply

Graphon 23 hours ago | link

What's the international symbol for a lawyer?

https://www.google.com/search?q=parasite+icon

reply

omonra 1 day ago | link

This may be very interesting to crack, but who is responsible for Google making their CAPTCHA almost impossible for human to decipher now? I seriously have to click 5 times before even seeing anything resembling letters I can parse

reply

joelthelion 20 hours ago | link

This simply means computers are getting really good at this game. And that Google, with all its power, hasn't found a better alternative to Captchas yet.

I find that pretty worrying for the future of the internet. An internet without working captchhas will probably be full of bots and spam.

reply

Jach 19 hours ago | link

Naive Bayes spam classifiers are fast enough that I think they could be used as drop-in replacements, and they'd be well-trained too considering Google's existing success with spam in email. And Naive Bayes isn't the only solution, there are plenty of other heuristics. Even something as dumb as "first post from this user/IP/'identity' and full of links?" would catch a lot of common spam.

reply

Jach 6 hours ago | link

I really wish HN required a comment with a downvote on accounts with more than 100 karma (where one can reasonably assume the user isn't a newbie leaving empty comments like "I agree"). I'd ask that person to think it through: why doesn't gmail require a captcha every time you want to send an email? What about other mail providers? What about your own native client? An internet without working captchhas[sic] will probably be full of bots and spam. Is your inbox full of bots and spam? Mine isn't. Even my spam box is at less than 800 over the period of a month; before one of the big botnets was taken down a few years ago that was generating most of the world's spam I still had less than 4000 over a month.

reply

masonlee 20 hours ago | link

Is this an argument that future web services may depend more heavily on identity/reputation services?

reply

duiker101 1 day ago | link

I didn't go in depth of the article/method but from what I've read it takes advantages of the audio function not the images.

reply

fibertbh 1 day ago | link

Agreed. They are unbearable. But name a better alternative!

Anything that I can think of like adding to the CAPTCHA such as a half-line of letter above and below or adding noise would probably make cracking them programmatically even easier.

reply

verroq 1 day ago | link

You only have the type one word correct, and they make sure one word is readable so with some practice you can easily recognise which word is the key-word.

reply

btilly 1 day ago | link

With practice and decent eyes. Once your eyes go downhill, it is amazing how much changes.

Most software developers are under 40 and therefore have little to no appreciation for what happens to people's eyes after they are past 40.

reply

ohgodthecat3 1 day ago | link

He isn't speaking of reCAPTCHA but google's nearly impossible to read captchas that take way too much time to decipher.

If you'd like to see it you can usually get it by putting in a bad password to a gmail account too many times (though I don't know if that has other consequences).

Edit: Here is an image example

Media_httpwwwtechianc_jczdt

reply

simonbrown 1 day ago | link

Strange they don't use reCAPTCHA for this.

reply

sp332 1 day ago | link

I wonder what they do for non-Latin locales?

reply

verroq 1 day ago | link

Oh those, yeah they are very hard to decipher.

reply

dkersten 1 day ago | link

For me, the key word is usually the unreadable one. I've never had success typing only the readable one.

reply

conradfr 1 day ago | link

After reading that previously on HN, I tried and sadly had to type both word.

reply

s_henry_paulson 1 day ago | link

Here's the Ars Technica article which does much better job explaining t
via http://www.speechtechnologygroup.com/speech-blog - Speech recognition and captcha solving...... ...