Speech Technology Group: October 2012

Wednesday, October 31, 2012

Speech Recognition for the food service industry

Here's another example of a speech recognition implementation in warehouse management systems.

Lucas Systems, a provider of voice-directed warehouse applications, today introduced the next version of Jennifer FoodSelect, a voice-directed solution designed for foodservice and grocery distribution centers.

The latest release of Jennifer FoodSelect adds configurable voice-directed receiving, putaway, returns and replenishment, in addition to enhanced support for GS1 data standards and food traceability using the Engage Management Services Console. Like all Jennifer solutions, Jennifer FoodSelect combines voice direction with speech recognition (using the Serenade Speech Recognition Engine), and, where appropriate, barcode scanning, key entry, and display capabilities provided in industry-standard multi-modal mobile computers. The solution supports GS1 data standards and provides flexible voice- or scan-based data capture for Global Trade Item Numbers (GTIN), lot numbers, date codes, catchweights, and other information using Motorola, Honeywell, or other standard hardware terminals.

The updated product includes the following:

Order Selection. Jennifer FoodSelect supports two-stage PIR picking and single and dual-pallet case picking. For foodservice warehouses, it includes on-demand case label printing that eliminates selector idle time, reduces paper handling, and adds points to individual productivity rates.
Truck Loading. Jennifer eliminates loading errors, improves efficiency for warehouse workers, drivers, and customers, and enhances worker safety. Standard, voice-enabled HACCP checklists help meet food safety and traceability requirements.
QC/Audit. The integrated QC/Audit module allows managers to better prioritize audits and focus scarce QC resources on the orders that need to be checked.
Receiving. Associates can use a combination of scan, screen, and voice to compare physical receipts against purchase orders or ASNs. Discrepancies are identified and data capture is immediate.
Putaway. The voice system directs workers through the putaway process, capturing and verifying pallet, item, and location information through speech recognition and barcode scanning.
Replenishment. This application can be integrated with voice-directed selection so that let-down requests are processed immediately to avoid short shipments.
Returns. Processing returns with Jennifer eliminates clerical steps to improve efficiency, minimize data entry errors, and accelerate inventory updates.
Engage MSC. Jennifer’s Wweb-based console includes route planning and management, robust productivity and process tracking, and system configuration tools. Engage allows supervisors to manage and coordinate selection, replenishment and other tasks to optimize overall efficiency and maximize throughput.

Upper Lakes Foods, a regional foodservice distributor in Minnesota, is the first customer to install this latest Jennifer voice solution, with immediate, dramatic improvements in selector accuracy and productivity.

“Jennifer FoodSelect addresses the accuracy and productivity challenges of foodservice and grocery DCs, in addition to supporting highly efficient product track and trace capabilities using industry-best speech recognition and barcode scanning,” says Jennifer Lachenman, vice president of product strategy and business alliances at Lucas Systems, in a statement. “From the beginning our goal in introducing Jennifer FoodSelect was to provide large and small foodservice and grocery distributors with the most comprehensive and configurable voice solution possible. This configurable industry solution approach is the best way to deliver a full-featured product while minimizing customization, speeding implementation, and enabling flexibility for the future. The idea is to accelerate deployment without sacrificing flexibility or important features food distributors need to compete.”

via http://www.speechtechnologygroup.com/speech-blog - Here's another example of a speech recognition implementation in warehouse management systems. Source speechtechmag.com Lucas Systems, a provider of voice-directed warehouse applications, today introduced the next version of Jennifer FoodSelect, a voice-directed solution designed for foodservice ...

Google Introduces Enhanced Voice Search For iOS & Android Devices

The latest update to Google's voice search includes a text to speech response…

Source google

Google is now making it even easier to search and find the information you need. Today, the company introduced an update to its app for iOS and Android with a fancy new enhanced voice search.

The new voice recognition technology is simple to use. After you’ve updated your Google app, tap the microphone icon and start asking questions. Your words will appear as you speak. If the answer is short, Google will actually tell you the answer aloud.

Google Search Engineer Kenneth Bongort shared the news on the company blog and tips his hat to the “Knowledge Graph” for the ability to be able to do this. Google originally announced the “Knowledge Graph” back in May 2012. It’s an intelligent model that understands real world entities and their relationships to one another. “Things, not strings.”

This is great if you’re driving and need to stay hands free. I’ve had to pull over before to manually type in what I needed. Plus, if you’re not the greatest speller, searching with your voice is ideal.

via http://www.speechtechnologygroup.com/speech-blog - The latest update to Google's voice search includes a text to speech response… Source google Google is now making it even easier to search and find the information you need. Today, the company introduced an update to its app for iOS and Android with a fancy new enhanced voice search. The new voice ...

Speech Recognition for the Warehouse

Using speech recognition in he warehouse for voice picking can provide a fantastic ROI.…

Source. google.com

The use of voice technology in the warehouse is expanding, particularly for order picking. Voice directed order picking involves the use of a wearable computer with a headset and microphone. The order pickers are instructed by voice on which items to pick and where to pick them. The pickers then verbally confirm their actions back to the system (WMS) via a radio frequency (RF) and local area network (LAN). The list of potential benefits from using a voice system is impressive:

99.9% plus order accuracy
15% plus increase in productivity over typical bar coding systems
Removes trips back to assignment desk
Removes cost of printing and distributing picking documents
Removes cost of re-keying order amendments, picking confirmations and catch weights
Hands free & head down — allows operators to focus on the task
Real-time feedback for proactive management
Real-time stock updating
Improves safety
Reduces training — verbal prompts easier

via http://www.speechtechnologygroup.com/speech-blog - Using speech recognition in he warehouse for voice picking can provide a fantastic ROI.… Source. google.com The use of voice technology in the warehouse is expanding, particularly for order picking. Voice directed order picking involves the use of a wearable computer with a headset and microphone. ...

Voice search shoot-out: Google’s new voice search vs Apple’s Siri, hands-on

How do the latest assistant releases by Apple and Google compare? John Koetsier puts them them to the test

Source VentureBeat

Google just released its newest voice search functionality in what VentureBeat’s Ricardo Bilton called a direct assault on Siri, updating both its Android and iOS apps. I wanted to test both of them hands-on.

Siri is an idiot savante, sometimes more savante, and sometimes more idiot: she didn’t know where Washington DC is. Google, on the other hand, seems to have a bigger brain.

Siri does have two advantages. First, she has deep knowledge in a few select areas. And second, she remembers history, so she evaluates current questions in context. (That can be a good thing and a bad thing, as I found when I asked about traffic conditions to Vancouver, BC. I had previously asked about things to do in Vancouver, and to this traffic question Siri told me that the Canucks-Red Wings game had been cancelled.)

Google seems to be smarter, to know more.

Part of that might be because when Google doesn’t have the answer and it defaults to a web search, that seems very normal, versus when Siri defaults to web search (errr … Wolfram Alpha search) it seems like an option of last resort. But part of that is that Google just knows more — its massive knowledge graph is a cumulative result of 15 years of data gained from countless trillions of search queries in an effort to essentially emulate a Star Trek artificial intelligence experience.

Both fared well on some standard factual questions, in a sense. But oddly, Siri picked out the British version of The Office, while Google selected the U.S. version:

Source: John Koetsier

Siri vs Google: the cast of The Office

Another simple one: the time in a different time zone. Both answer correctly, with Siri adding a little more personality, and possibly, the start of a contextual conversation:

Source: John Koetsier

Siri vs Google: the time somewhere else

Google, with the wealth of data riding on its shoulders, is fairly confident when it gives you results. The results may sometimes not be extremely helpful, though they usually are, but they are always delivered definitively.

Siri, on the other hand, is sort of like an unsure teenager, who ends every sentence trailing up in tone, transmuting statements into questions. When Siri isn’t sure, she checks Wolfram Alpha and gives you an answer with a conditional: this “might” answer your question:

Source: John Koetsier

What is a curveball? Both have the answer, though Siri’s not too sure …

In other cases, Siri is more idiot than savante, it must be said. For instance, any app purporting to help users search and navigate the world of information should know what Washington DC is, and how to find it.

Google does, and provides a map with directions. Siri, unfortunately, is painfully clueless:

Source: John Koetsier

How could you not know where Washington, DC is, Siri?

Siri’s lack of real, deep knowledge about the world is evident in other typically search-engine-style questions about companies and people.

In this case, Siri does not know who the CEO of Apple is, and rather schoolmarmishly tells me that everything I need to know about Apple is at Apple’s website. (As a journalist who focuses on Apple coverage, I just might beg to differ a little.)

Source: John Koetsier

Siri, you should really know the CEO of the company you work for …

But in areas where Siri has been directly connected to a deep source of specific data on a certain set of questions, she shines, giving detailed, information-rich, even visual answers.

Such as stock quotes:

Source: John Koetsier

A little insider tip, please …

That includes Yelp-assisted categories such as local search, where Siri has access to rich and detailed information on local restaurants, events, and entertainment. Google, on the other hand, is unhelpfully responding to a search by giving me more search options to continue search in. Sorry, I’m searching to find, not searching to search.

(Note the vestiges of skeuomorphic design in Siri’s answer.)

Source: John Koetsier

Thanks for giving me search engines in response to a search, Google

One key difference where Siri has the potential to shine: context. Siri knows what you’ve just asked, and so you can build a chain of searches and steps to accomplish a task.

For example, you can ask Siri to send an email, and she’ll ask for the name of the recipient, and, in a subsequent step, the message itself, meaning that you can accomplish a complex series of actions without feeling like you need to do it all at once. Google can’t help you at all with those sorts of tasks at the moment, although apps like Edwin, Vlingo, Speaktoit, and Samsung’s S Voice can accomplish various parts of what Siri does.

In search, context helps too (even if Siri doesn’t always do the right thing with it):

Source: John Koetsier

Context helps … but that’s not the Four Seasons I was thinking of

This is of course not an exhaustive list … but it does highlight some of the differences.

Based on the results, and my preconceptions and prejudices, if I had to pick a personal assistant to help me do things on my phone, I’d pick Siri, of course. But if I had to pick one app for web search and information retrieval, I’d have to go with Google.

And frankly, given the data that Google has at its disposal, it’s hard to see how Apple can win this one.

I think if Apple was completely cold-blooded and logical about this, they’d partner with Google for search results in Siri, not Wofram Alpha, and they’d focus Apple-specific Siri enhancements on strong vertical partnerships with services like Yelp and OpenTable and airline reservations, as well as specifically phone control tasks.

That would, of course, be a seriously naive strategy. Which doesn’t mean it would be wrong.

via http://www.speechtechnologygroup.com/speech-blog - How do the latest assistant releases by Apple and Google compare? John Koetsier puts them them to the test Source VentureBeat Google just released its newest voice search functionality in what VentureBeat’s Ricardo Bilton called a direct assault on Siri , updating both its Android and iOS apps. ...

Friday, October 26, 2012

IVRs without Frustration: Speech Recognition Gets the Human Touch

Here's a way you can achieve 100% accuracy without touchtone technology by allowing the caller to speak to your IVR. It's a mix of speech recognition and agent assisted IVR technology.

Source: google.com

If you’ve ever used an interactive voice response (IVR) system – and unless you’ve been living on a desert island, you have – you’ll know that there are some truly bad systems out there. The earliest voice-driven IVRs that used speech recognition often required users to shout into the telephone, repeating themselves until they nearly spontaneously combusted in frustration when the system returned the message, “I didn’t understand your response” for the sixth time.

Luckily, the technology has come a long way.

Once upon a time, the number of response combinations that the systems could understand were very limited, which is why it had to restrict your responses (“say ‘one’ for the customer service department” instead of simply asking you to describe what you’re looking for in natural language). Those days are nearly behind us, thanks to newer solutions offered by companies such as Massachusetts-based Interactions, which offers a conversational natural language solution that allows people to speak to computers as if they were live agents.

Speech Technology Group (www.speechtechnologygroup.com) offers an agent assisted IVR system, that works in conjunction with the powerful Microsoft speech engine. The combination of the two offers the highest accuracy and extraordinary value.

Interactions’ solution leverages a combination of automated speech recognition (ASR) and what the company calls “human assisted understanding” (HAU). HAU improves accuracy and natural-language understanding by supplementing speech recognition when it can’t perform. In traditional speech-recognition applications, all requests get routed directly to an ASR engine. When the engine can’t recognize something, it keeps re-prompting the caller, or eventually gives up and transfers the call to a live agent. This limitation causes poor application design and performance – and frustrates callers. Interactions says it has overcome this application-design limitation.

Rachel Metz of MIT (News - Alert) Technology Review says it’s about more than simply routing calls with less frustration.

“Interactions’ software is, hopefully, more than a solution to impossibly annoying automated support systems,” writes Metz. “It’s also an example of software and human intelligence working together. Rather than relying entirely on software to handle calls, Interactions automatically hands speech that its software can’t cope with over to human agents, who select an appropriate response.”

Who would have thought that humans interacting vocally with computers could be a source of anything but a nervous breakdown?

via http://www.speechtechnologygroup.com/speech-blog - Here's a way you can achieve 100% accuracy without touchtone technology by allowing the caller to speak to your IVR. It's a mix of speech recognition and agent assisted IVR technology. Source: google.com If you’ve ever used an interactive voice response (IVR) system – and unless you’ve been living ...

Friday, October 19, 2012

Maccessibility » Apple looking to add individual personality to text-to-speech voices

Here is a way to make text to speech technology even more appealing:

Source: google.com

The filing, titled “Voice assignment for text-to-speech output,” looks to create “speaker profiles” which can change the voice characteristics of TTS output to match parsed-out metadata like age, sex, dialect and other variables.

As noted by the application, many systems exist today to aid the visually impaired, including the system on Apple’s iPhone, however most TTS engines “generate synthesized speech having voice characteristics of either a male speaker or a female speaker. Regardless of the gender of the speaker, the same voice is used for all text-to-speech conversion regardless of the source of the text being converted.” Apple’s invention proposes a different solution.

This is a great read, and a fascinating concept. I was very skeptical as I started reading this piece, but as I understood all the nuances of the invention, (pun intended), I found myself nodding. This could be an excellent advancement for TTS.

via http://www.speechtechnologygroup.com/speech-blog - Here is a way to make text to speech technology even more appealing: Source: google.com The filing, titled “Voice assignment for text-to-speech output,” looks to create “speaker profiles” which can change the voice characteristics of TTS output to match parsed-out metadata like age, sex, dialect an ...

Wednesday, October 17, 2012

Google’s Search Lead Can Become Apple’s Advantage In 5 Years

Will the upcoming battle for search be fought with voice recognition technology?

Source. google.com

Many seem to believe that it is a foregone conclusion that Google has already won the search engine wars. After all Google currently holds 66.7% of the U.S. search market as of September 2012, with its closest competitor, Microsoft (MSFT), at a distant 15.9% and Yahoo (YHOO) at 12.2%. Furthermore, Google currently generates about 96% of its revenues from advertising driven by its search engine.

There has been a lot of focus recently on Facebook’s (FB) challenges in monetizing ad revenues as consumers shift to smartphones. The challenge rests with the diminishing screen size; as the screen size of a smartphone is considerably smaller than the screen size of a desktop or laptop, then the number of advertisements that can be displayed is more limited. Such consumer transition to a smaller screen is likely to affect display advertisements more than paid search advertisements, although it will still have some negative effect; hence it seems Google is better positioned to weather such transition than Facebook.

What if the screen size went to zero? How can the screen size go to zero? Voice recognition. Although in reality the screen is still there, when a user dictates his query and the phone provides an answer through its own voice, as Apple’s Siri does, then the user is less likely to look at the screen. This will undoubtedly present a challenge to Google. Consumers may be fine being visually exposed to no less than ten search results on their screen, but it is less comfortable for consumers to actually listen to no less than ten possible answers to their questions.

It is important to note that Siri at this time is not a fully fledged search engine. As a matter of a fact Siri can be instructed to use Google, Bing or Yahoo in retrieving its results. Furthermore, Google has also developed its own voice powered search for iOS. However, Apple will always retain the upper hand, as it has been pointed out by Daniel Eran Dilger of Apple Insider:

Apple previously held up approval of Google Voice for over a year, and kept Google’s Latitude friend finder app in limbo for two years as it considered the features. This left Google to rely upon web app alternatives to native titles in the App Store.

Apple currently generates most of its revenues from sales of its products and services, with minimal advertisement revenues. During the next five years, it is very likely that Apple will try to take a bite out of Google by targeting the paid search market through Siri. As a matter of a fact, if Apple simply succeeds at becoming less dependent on Google in the search market, without a noticeable decline in user experience, then it would consider such efforts well worth it. To such effect, Apple has recently announced the hiring of William Stasior, previously in charge of Amazon’s A9 search engine, hence paving the way for further incursion into the search engine market.

During the next five years, the battle for search engine market share will be fought in the voice recognition arena, where Apple currently has advantages through its iPhone market penetration and Siri. Furthermore, given Google’s current dominant search market share of over two-thirds, such battle can only present downside risk potential for Google, while Apple has a potential upside. To conclude this article on the light side, such search engine wars will not be as entertaining as the Google-MSN-Yahoo search engine rap battles in this video.

via http://www.speechtechnologygroup.com/speech-blog - Will the upcoming battle for search be fought with voice recognition technology? Source. google.com Many seem to believe that it is a foregone conclusion that Google has already won the search engine wars. After all Google currently holds 66.7% of the U.S. search market as of September 2012, with i ...

Saturday, October 13, 2012

((delay: in 8 hoursThe Biggest Threat To Apple Right Now

Apple's reliance on third parties for critical applications - a growing concern and challenge to keep the competitive advantage…

Source Business Insider

In July, we wrote that we were blown away with Google Now, the voice-controlled search service on the newest version of Android. The app is Google’s answer to Siri.

Except it’s better. Unlike Siri, Google Now can tap into Google’s search engine and return better results. Unlike Siri, Google Now is a finished product that just works. And unlike Siri, Google Now is actually useful and accurate.

Then there’s the whole Apple Maps debacle, which spurred a humble apology from Tim Cook and a new section in Apple’s App Store for alternative mapping apps. Without beating a dead horse, let’s just say Apple Maps are pretty awful, sometimes hilariously so.

There’s a pattern developing here.

Whenever Apple attempts a data-driven service like Siri or maps, it fails miserably. Both those services rely on a bunch of third parties like Wolfram Alpha or Yahoo (for Siri) and Waze, TomTom, or Yelp (for Maps). The results are two shaky products that can’t get you what you need the way Google can. When Apple cuts out Google, things get worse.

The New York Times touched on the subject last week when it wrote about Cook’s apology for Maps. One nugget in the story really stuck out:

The company’s weakness in this area could become a bigger problem over time as smartphones become more intimately tied to information and software on the Internet — a field where Google, which makes the competing Android phone software, has the home-turf advantage.

The article goes on to mention Ping and MobileMe as other Apple failures, but those are different. MobileMe was a way to sync contacts and email between your PC, the Web, and iPhone or iPad. Ping was a social network baked into iTunes. Neither were products that rely on massive amounts of data like Maps and Siri do now.

And that’s where things look scary for Apple. Products like MobileMe can be tweaked and fixed. (MobileMe wasn’t that bad to begin with, but iCloud is definitely an improvement.) But if you don’t have a ton of data for services like Siri and maps, there’s nothing you can do. The product is only as good as the data you have.

This is the biggest problem with Apple cutting out all things Google from the iPhone. Google has had years to collect massive amounts of data from its users. As a hardware company, Apple has to hope others have the data it needs. Until recently, it relied on Google for that. Now its trying it’s luck with other companies. That’s not working.

What’s really worrisome is that Apple could take its battle with Google a step further and get into the search business, removing Google as the default engine on iPhones and iPads. It’s not very likely to happen, but there have been talks of Apple trying search since before it bought Siri a few years ago. If Apple really is trying to destroy Google, then don’t be surprised if search is its next target.

It’s a disaster waiting to happen.

via http://www.speechtechnologygroup.com/speech-blog - Apple's reliance on third parties for critical applications - a growing concern and challenge to keep the competitive advantage… Source Business Insider In July, we wrote that we were blown away with Google Now , the voice-controlled search service on the newest version of Android . The app is G ...

Friday, October 12, 2012

((delay: in 7 hoursApple giveth and Apple taketh away: features users miss in iOS 6

Is it worth updating to iOS6 yet?

Source Ars Technica

Last month’s introduction of iOS 6 gave iPhone, iPad, and iPod touch users many things, like Do Not Disturb, new call features, and increased Siri capabilities. iPhone 5 and 4S users now have a cool new Panorama mode in the Camera app, there’s Facebook integration where once there was only Twitter, we can sync Reminders and Notes over iCloud, privacy controls have received a complete overhaul, and Passbook is showing some promise.

We also had some (major) things taken away from us with the release of iOS 6. The most obvious of these things is a consistent, built-in transit experience in Maps—or pretty much anything related to iOS 6 Maps at all. But Maps seems to be just about the only thing people tend to name when thinking about iOS 6’s shortfalls—or is it? A number of Ars readers wrote me with their own stories about favorite features that used to be part of iOS 5 but were since removed in iOS 6. So, I reached out to the Ars staff and Twitter in order to see what other little things users were missing.

It turns out there are a number of smaller things that people really liked but can no longer access. There are also a few features that people would’ve liked to see Apple take just a little further. Below is a list of the most commonly mentioned items that are either no longer part of iOS 6 or should be part of iOS 6.

Bring back app gifting

iOS users used to be able to gift apps to other people right from inside the device. When you navigated to a particular app through the iOS App Store app, there was an option to “Gift This App” underneath the ratings and next to “Tell a Friend.” This was particularly useful when it came to giving fun or interesting items to friends and family to check out, or for companies to gift apps to employees as a “thank you.”

This is unfortunately no longer the case in iOS 6. As noted by Apple’s own support document on the matter, this feature was part of iOS 5 and earlier (reverting is “usefully” listed as your only option if you want to continue to gift apps). This is undoubtedly thanks to some of the major App Store changes Apple has made since the release of iOS 6, but we can’t see why this feature had to be taken away. Users can still gift apps to other iOS users, but it has to be done through iTunes on the Mac or Windows.

iTunes Match and deleting music

Users who subscribe to Apple’s $25/year iTunes Match service were quite happy with the ability to delete specific songs from their iOS devices when the mood strikes. That was, of course, how things used to be with iOS 5.x, but it’s apparently no longer the case with iOS 6. Who would want to delete individual tracks anyway? That’s a fascist anti-album attitude there, son.

It turns out that this omission is quite irritating to iTunes Match users, though there is an inconvenient workaround for those committed to deleting music they don’t want from their devices. Users can turn off iTunes Match altogether on their iDevices and then delete the track from the Music app—but this only works if you’re trying to delete a song that was actually downloaded from iTunes Match. Turning off iTunes Match removes all other music that is available to you whether you want it there or not. You also have the option of going home and deleting it from your iTunes library. But then this wouldn’t be a very post-PC world, now would it?

Podcasts and music: no longer living together in harmony

Not everyone likes to mix podcasts and music into one playlist, but those who do liked to do it—a lot. I’ll admit that I’ve done this myself, especially when planning out a long flight or road trip. But since the introduction of Apple’s new Podcasts app—which has had its own challenges when it comes to usability—Apple has continued to separate podcasts from the rest of the pack.

It’s no longer easy (or possible, for that matter) for users to create on-the-go playlists within the Music app that include both music and podcasts. It’s not even possible to create a playlist within iTunes on the desktop that has a mixture of music and podcasts to sync—you can make the playlist alright, but once you sync, the podcast won’t show up in the same playlist on the iOS device.

Don’t cry for me, Google Street View

Aside from built-in transit, it turns out that a lot of iOS users made use of Google Street View as part of iOS 5 (and previous) Maps. This was by far the most commonly mentioned item when I asked about missing features on Twitter. Users apparently liked being able to see exactly what a specific address or street looks like when they’re navigating around on their iPhones. And some people (such as myself) have a poor sense of direction, so actual photos of the location you’re looking for can be a huge help.

Some would argue that 3D flyover mode within iOS 6 Maps is meant to replace this feature, and to some degree, it can. The 3D flyover feature does allow you to see what certain buildings look like at different angles, but it doesn’t quite fill in the gaps when it comes to walking or driving down an actual street at human level. What does a specific storefront look like in 3D flyover mode? You can’t get close enough to tell. Additionally, many iOS users don’t own devices that support 3D flyover mode—it’s only supported on the iPhone 5 and 4S—meaning that iPhone 4, 3GS, and original iPad owners have now lost useful features as part of Maps with no real gains.

Give us a YouTube app that can run in the background again

One of the benefits to Apple’s default YouTube app was that it could run in the background. Why would someone want to play YouTube videos (emphasis on videos) while doing other things on an iOS device? Sometimes videos show up and—in the case of video blogs or indie bands—all you want is to listen to the audio and you don’t necessarily need to see someone’s cat in order to get the full experience.

Apple got rid of its own YouTube app as part of iOS 6, however, which wasn’t quite seen as a huge deal by most users. This was mostly because Google was quick to release its own YouTube app to replace it, along with a few new features that Apple had been sluggish to implement. But Google’s new YouTube app can’t run in the background like Apple’s YouTube app did, which makes YouTube aficionados sad pandas when using their iOS devices. This is on Google’s shoulders, not Apple’s, but it’s a point that was made often enough by iOS 6 users that we thought it was worth including.

Details that should have been implemented, but weren’t

There will always be wish lists a mile long for features that aren’t part of iOS 6, but there are some features that are just baby steps away from being great. We don’t know what Apple’s reasons are for not adding some of these things, but we hope to see them in a future update. They’re not “pie in the sky” type wishes either, so Apple, if you’re listening, this is what we’d like:

Allow us to add contacts to groups from iOS. You can create and manage groups from the Contacts app on the Mac, and you can use your iOS device to interact with groups when it comes to the Phone app and Do Not Disturb. iOS is clearly aware that the groups exist, but when you find yourself adding new contacts into the device itself, there’s no way to add them to any particular group. If you want to group those new contacts, you have to sync the data over to your Mac and manage the groups there. Again, this is not very post-PC-friendly.
Shared Photo Streams should actually be sharable. I spoke highly about Shared Photo Streams in my iOS 6 review, only because I felt Photo Stream by itself was largely a pointless feature. But that doesn’t mean it’s perfect—the main problem with Shared Photo Streams is that Apple treats them like one-way blasts from one person to others. There’s no way to add other parties to a shared stream so that they, too, can contribute photos. If you have a group of friends who like to share photos, every individual with an iOS device would have to create his or her own shared stream to push to everyone else. Wouldn’t it be nice if you could add your siblings and cousins to one stream, and everyone could share in the fun?
Third-parties want hooks into Siri. Siri gained some great new features as part of iOS 6, like the ability to look up movie facts and times, and it gained a plethora of sports-related features too. You can now launch apps through Siri, and make Tweets or Facebook posts. But third-party apps still can’t plug into Siri directly—apps that do plug into Siri (such as Yelp or OpenTable) are only able to do so with Apple’s blessing for the time being. Some developers have been using a proxy server called SiriProxy to achieve this on their own, but it’s less than ideal when compared against the potential to make direct API calls to Siri.

via http://www.speechtechnologygroup.com/speech-blog - Is it worth updating to iOS6 yet? Source Ars Technica Last month’s introduction of iOS 6 gave iPhone, iPad, and iPod touch users many things, like Do Not Disturb, new call features, and increased Siri capabilities. iPhone 5 and 4S users now have a cool new Panorama mode in the Camera app, there’ ...

Why Speech is Key for a Great Mobile Customer Service Experience -

Mobile strategy and speech recognition…

Source community.nuance.com

Why Speech is Key for a Great Mobile Customer Service Experience

Fifteen years ago, analysts and investors routinely asked companies, “What’s your Internet strategy?” Today they ask, “What’s your mobile strategy?”

The corollary to both questions was, “If you don’t have one, you’d better get one – fast.” Then and now, that advice is sound because it emphasizes the fact that consumers have new preferences for the way they interact with companies. Failure to accommodate those preferences often leads to, well, failure.

There’s ample research showing why mobile phones have quickly become an important customer service channel. Today, more than 80 percent of customer service calls originate from a mobile phone. In many countries, including the United States, more than half of the population now owns a smartphone. Meanwhile, consumers increasingly say they prefer self-service options when interacting with a company.

Considering that a mobile phone is a device that most people have with them at all times, a smartphone app is an ideal way for businesses, government agencies and other organizations to accommodate self-service preferences. But simply creating an app doesn’t guarantee a great experience for customers. Instead, organizations need to consider several factors when developing their mobile strategy.

First, even when a smartphone has a big screen, a physical QWERTY keyboard or both, many people don’t want to type their customer information and query. And when they’re driving or walking, typing is even less of an option.

Hence the appeal of a speech-enabled customer service app, which lets users simply speak their log-in information and question. Thanks to the popularity of speech-powered mobile assistants such as Siri on the iPhone, Google Now and Samsung’s S Voice, consumers are increasingly comfortable talking to their smartphone when they need information. Speech-enabled customer service apps leverage that familiarity.

Second, the key to a great user experience is selecting a flexible, feature-rich speech platform. For example, the ideal platform goes beyond the table stakes of speech recognition to provide Natural Language Understanding (NLU), which uses sophisticated algorithms to determine not only what the person is saying but also the intent.

Natural language understanding avoids the frustration that occurs when customers use a wide variety of vernacular terms rather than a limited set of industry terms. For example, suppose that a traveler launches her airline’s app and asks, “When does my plane leave?” instead of “When does my flight depart?” With NLU, the app can understand vernacular words and phrases, even ones that the platform has never heard before. This combination of flexibility and intelligence enables users to speak to the app as naturally as they would if they were talking with a live agent.

Without NLU, the app would struggle to find a match in its database because she didn’t use industry terms such as “flight” and “depart.” The app then would frustrate her by providing the wrong answer or by asking her to repeat herself. If she gets frustrated enough, she’ll probably close the app and call instead, eliminating the cost savings of self-service for the airline and eliminating the convenience of self-service for her.

That example also shows that businesses and other organizations have only one chance to make a first impression. That might sound cliché, but it’s also true. Consider the bottom-line impact when an app does a lousy job of meeting customer expectations for self-service. The organization now has to add contact center staff to field all of the calls that could have been avoided if the app were capable of meeting customer needs.

When a speech-enabled app meets or even exceeds customer expectations, it also provides another barrier to churn and all of the costs that come with turnover. A great app also leverages the enormous amount of bad customer service apps in the marketplace by helping that organization stand out from the pack.

In fact, across all types of free apps – from customer service to games to news – the abandonment rate is 95 percent within the first 30 days after the user downloads them. To avoid that fate, organizations should speech-enable their customer service apps to ensure that they deliver the right answer right away, which is key for providing the kind of value that will keep users from abandoning their app – and even abandoning as customers.

via http://www.speechtechnologygroup.com/speech-blog - Mobile strategy and speech recognition… Source community.nuance.com Why Speech is Key for a Great Mobile Customer Service Experience Fifteen years ago, analysts and investors routinely asked companies, “What’s your Internet strategy?” Today they ask, “What’s your mobile strategy?” The corollary to ...

Futurist Ray Kurzweil Wants to Move Your Brain Into the Cloud

Some interesting thoughts of one of the brilliant minds in our industry…

Source google.com

Ray Kurzweil, author of The Age of Spiritual Machines and a pioneer of artificial intelligence software, has always been one of the most provocative thinkers on technology and its future. When he spoke at the Demo conference last week, it was no surprise that he covered everything from why computers will continue to get better at an exponential pace to how we will be able to expand our brains into the cloud with the help of biological devices in our bloodstream.

Promoting his forthcoming book, How to Create A Mind, Kurzweil spoke about technology but soon moved to a discussion of the brain, how he thinks it works, and how we will be able to enhance it.

Artificial intelligence is making discernible improvements in many areas, Kurzweil said. IBM’s Watson used its “total recall” of 200 million pages of natural language documents to win the Jeopardy! challenge. It wasn’t as accurate as people in understanding individual pages but was successful because it could digest so many more documents.

Speech recognition is working quite well, he said, noting that people are critical of things like Siri, but that it’s pretty amazing that people are talking to their computers at all. Google’s self-driving cars are also doing very well, he added.

He defended his theory of exponential growth (popularized in The Age of Spiritual Machines) saying that compute capabilities have actually been following this path since the 1980s. While critics say silicon scaling (known as Moore’s Law) can’t continue forever, that’s actually the fifth paradigm to bring exponential growth in computing. He expects self-organizing 3D transistors will be the sixth paradigm.

He was most animated, though, when talking about progress as “reverse engineering the brain.” Improvements in technology such as MRI spatial resolution have led to a better understanding of how the brain works. He espoused a thesis about the uniform structure of the neocortex, saying it is made up of 300 million undifferentiated “pattern recognizers” in a hierarchical structure. The difference in the amount of pattern recognizers compared with other animals is exponential, giving humans enough capability to create art, science, and literature.

Our brains develop those 300 million modules at a very young age. One reason kids can learn language or music so easily is that they haven’t filled up the modules, he said, but by the time we’re 20, we’ve filled them up. Therefore, we need to be able to remove things intelligently. People who are rigid will have difficulty doing that, he said, but you can learn new material at any age if you are able to move on and forget other things.

Still, Kurzweil said we have a limited capacity, but he is optimistic we can overcome this by “expanding our brains into the cloud.” Techniques that evolved in the biological brain are the same that are used for things such as speech and character recognition, and they will be used to expand our brain power.

Search engines already act in this way for many people, he said. As a result, we are now smarter and both individuals and work groups can do more.

Kurzweil also likes the potential of Google Glass to do things like identify people you meet, give you directions, and basically listen to your conversations to give you information that “overlays” the real world.

“You’ll just get used to an assistant helping you,” he said, calling such things “mind expanders.”

In the long run, he doesn’t think we’ll have hardware implants in our brains, but rather biological devices that live in our bloodstream and will give us more capabilities. These will eventually be able to functionally recreate the pattern recognition ability of the brain. This, he believes, will lead to a qualitative leap in understanding, similar to the jump between apes and humans with the expansion of the neocortex.

This might be 25 years off, he said, but already there are a number of medical devices that can be put into the blood. In the meantime, we’ll all have more intelligence from our devices, even if they are not physically inside our bodies.

Asked by Demo host Matt Marshall about the potential downsides, Kurzweil said technology has always been a double-edged sword. Fire has been used for good and bad, he said; artificial intelligence can be as well. He noted that AI is now very widely distributed. “A kid with a smartphone in Africa has access to more information than the president of the United States did 15 years ago,” he said.

If you have an AI that is smarter than you and it turns on you, you’re in trouble, Kurzweil said, unless you get an AI on your side that is even more intelligent. This isn’t an issue in AI today, but is an issue in biotechnology. Software viruses have gotten more and more sophisticated, but they haven’t shut down the Internet or stopped people from using computers. Instead, we have an evolving technological immune system that is “more or less working,” he said.

Of course, ideas like this have been played out in science fiction for years, but many more of them may soon become reality. Kurzweil may be wrong in some specifics, but he’s always entertaining.

via http://www.speechtechnologygroup.com/speech-blog - Some interesting thoughts of one of the brilliant minds in our industry… Source google.com Ray Kurzweil, author of The Age of Spiritual Machines and a pioneer of artificial intelligence software, has always been one of the most provocative thinkers on technology and its future. When he spoke at t ...

Tuesday, October 9, 2012

Google deploys ‘virtual brain’ for smarter searches

Google is using its massive computing power to simulate the human brain to improve the results of its speech recognition engine…

Source google.com

After being taught to recognize cats, people and other images by watching YouTube videos, Google’s virtual brain technology is to be put to the ultimate test: making Google’s products smarter.

The virtual brain technology, which is patterned after how brain cells operate, will first focus on speech recognition, according to an article in the Technology Review.

“Most people keep their model in a single machine, but we wanted to experiment with very large neural networks. If you scale up both the size of the model and the amount of data you train it with, you can learn finer distinctions or more complex features,” said Jeff Dean, an engineer helping lead the research at Google.

Google’s learning software simulates connected brain cells that form a neural network that teaches itself to react to incoming data.

Such neural networks can learn without need for human assistance, and can go beyond research demos and be used in the field.

Last June, Google made headlines when its engineers publicized the results of an experiment involving 10 million YouTube video images.

The simulated brain cells involved 16,000 processors across 1,000 computers running for 10 days.

When applied to speech recognition, the neural network is expected to benefit Android, Google’s smartphone operating system, as well as its search app for Apple devices.

So far, the neural net is working on U.S. English.

“We got between 20 and 25 percent improvement in terms of words that are wrong. That means that many more people will have a perfect experience without errors,” said Vincent Vanhoucke, a leader of Google’s speech-recognition efforts.

Other Google products

The neural network is expected to boost other Google products, such as its image search tools that can understand the contents of a photo without having to check its accompanying text.

Even Google’s self-driving cars and Glasses are expected to benefit from such software, Technology Review said.

Next steps

Dean said his team is now testing models that understand both images and text together.

“You give it ‘porpoise’ and it gives you pictures of porpoises. If you give it a picture of a porpoise, it gives you ‘porpoise’ as a word,” he said.

Another next step could be to have the neural net learn the sounds of words, leading to speech recognition that can get extra clues from video, Technology Review said.

It added Google’s self-driving cars can also benefit by understanding their surroundings through the real-time data they collect.

Yoshua Bengio, a professor at the University of Montreal who works on similar machine-learning techniques, said Google’s work is a step closer to creating artificial intelligence that can match animal or soon, human intelligence.

Bengio said Google’s neural networks work similarly to the visual cortex in mammals - the part of the brain that processes visual data.

“There’s no way you will get an intelligent machine if it can’t take in a large volume of knowledge about the world,” he said.

But for now, he said Google’s neural networks still cannot perform many things necessary to intelligence, such as reasoning with information collected from the outside world.

For his part, Dean said Google’s neural networks have humans beat in some areas.

“We are seeing better than human-level performance in some visual tasks,” he said.

via http://www.speechtechnologygroup.com/speech-blog - Google is using its massive computing power to simulate the human brain to improve the results of its speech recognition engine… Source google.com After being taught to recognize cats, people and other images by watching YouTube videos, Google’s virtual brain technology is to be put to the ultimate ...

Monday, October 8, 2012

Changing Network Connections Silence Apple’s Siri

Is the speech recognition feature on your iPhone is not working, here is why:

Source. google.com

A German publication claims that flaws in iOS cause Siri the virtual assistant to occasionally not respond.

Heise.de reports that Siri may not work if a user changes the type of network connection. Switching from Wi-Fi to 3G/4G or switching between LTE and 3G allowed the staff to reproduce errors and prompt the feature to hang. According to Heise, both iOS5 and iOS6 are affected, and Siri is the only feature on the iPhone that becomes unresponsive when the network connection is changed. Only a reboot of the iPhone can bring Siri back to life.

Apple did not comment on the report and did not confirm that there may be a problem that impacts the functionality of Siri.

via http://www.speechtechnologygroup.com/speech-blog - Is the speech recognition feature on your iPhone is not working, here is why: Source. google.com A German publication claims that flaws in iOS cause Siri the virtual assistant to occasionally not respond. Heise.de reports that Siri may not work if a user changes the type of network connection. S ...

Posted by SpeechTechnologyGroup at 4:44 PM No comments:
Email This BlogThis!Share to X Share to Facebook Share to Pinterest

5 Reasons to Scrap Our Patent System: #1. Apple’s Siri

Some interesting background information about the patent situation
Siri

Source Forbes

Image via CrunchBase

The Apple (AAPL) iPhone features a flawed voice recognition service, Siri. And the patent battle that wiped out its inventor and his three decades of work is a great example of why America’s current patent system needs a revamp.

On October 6th, I was lost on a back road in central Massachusetts. My wife turned on her iPhone and asked Siri to tell us where we were. Siri’s answer: Lansing, Michigan. If Siri worked right, it would have told us we were in Holden, Mass. — it was only off by about 750 miles.

Nevertheless, Siri exists and Apple owns it. But according to the New York Times, Michael Phillips spent three decades “inventing software to allow computers to understand human speech.”

In 2006, he co-founded a voice recognition company, Vlingo, and its technology was incorporated into Siri before it went in the iPhone and was the subject of partnership negotiations with Apple and Google (GOOG).

But then our busted patent system intervened in the form of a threatening phone call from the CEO of Nuance Communications (NUAN), a big voice recognition firm. The Times reports that Nuance offered Phillips two choices: sell Vlingo to Nuance or face a barrage of patent litigation.

Nuance owned a “broad voice recognition patent” that it used to file six patent lawsuits against Vlingo. Phillips redirected $3 million of his company’s R&D budget to defend the suits, lost his partnerships with Apple and Google, and sold his company to Nuance in December 2008.

Like many things in America, patents were set up with a good purpose that has been twisted beyond all recognition by large corporations. For example, freedom of speech is a basic right but the January 2010 Supreme Court Citizens United decision turned money into speech and now those with the biggest bank accounts have the power that goes along with that money-mediated speech.

Patents were established to give inventors an incentive to take the risk of coming up with new technologies in exchange for protection against others who would steal the idea before the inventors could commercialize it.

In theory, an inventor should only get a patent if an invention is “novel (substantially different from what exists), not obvious (one can’t patent a new toaster simply by expanding it to handle five slices of bread), and useful (someone can’t patent an invisibility machine if invisibility is impossible),” according to the Times.

But thanks to our over-taxed patent system, those tests are not applied rigorously. A patent lawyer who spent seven years as an examiner told the Times, “If you give the same application to 10 different examiners, you’ll get 10 different results.” And that means that patents get issued for pre-existing ideas –for example, the crust less PB&J patent was granted to two men in 1999 and later acquired by JM Smucker (SJM).

Meanwhile, the economic costs of our flawed patent system are huge. Two Boston University professors found that 20% of the funds that software and some kinds of electronics would have spent on R&D is diverted to patent litigation — assessing what they call a patent tax.

And that patent tax could be over $50 billion across all industries. A Stanford University analysis found that since 2010, the smartphone industry spent an estimated $20 billion on patent litigation and patent purchases. And the number of district court patent filings has tripled in the last 20 years to 3,260 in 2010.

Here are the five biggest reasons to scrap our current patent system:

It does the opposite of its original purpose. The process by which Phillips was stripped of the fruits of his labor — that lets Apple profit from Siri – is a compelling example of how companies with deep pockets can turn the patent system into a weapon that accomplishes the opposite of its original purpose. Instead of allowing Phillips to profit from his invention, it let Nuance buy it after diverting Vlingo’s R&D budget to patent lawyers.

It rewards vague concepts rather than specific prototypes. Too often, patents are granted for general concepts described by inventors and with enough money to pay litigators, those concepts can be used to invalidate the work of engineers who build those concepts into real products. For example, the Times describes Apple’s Invention Disclosure Sessions in which, beginning in 2006, Apple lawyers started listening to engineers describe vague concepts — such as “software that studied users’ preferences as they browsed the Web” — and turned them into patents. Since genius is 1% inspiration and 99% perspiration, we should have a patent system that tilts the bulk of the rewards to those who do the perspiring.

Overworked patent examiners can’t do their jobs right. The Patent Office’s 7,650 examiners received “more than half a million applications last year, and the numbers have kept climbing,” according to the Times. The effect of this is that patent examiners have “two days to research and write a 10- to 20-page term paper on why I think [a patent application] should be approved or rejected,” as an experienced patent examiner told the Times. The standards for granting a patent may be sound but the Patent Office lacks the people to apply them.

Preemptive patent filings by big companies crush innovators. Apple’s application for patent 8,086,604 — now known as the Siri patent even though Vlingo and Nuance were not battling over it — was rejected nine times before Apple’s 10th tweak won it the coveted patent. This is just one example of how “large companies with battalions of lawyers can file thousands of pre-emptive patent applications in emerging industries,” according to the Times.

It diverts money from innovation. It is impossible to know how much better off consumer would be if the money spent litigating patents had been spent on paying engineers to design and build innovative products and services. But the $20 billion spent on smart phone litigation is nearly 23 times Apple’s 2011 R&D budget of $876 million.

If some of that money had been spent improving Siri, maybe it would have told us that we were in Holden instead of Lansing on Saturday.
via http://www.speechtechnologygroup.com/speech-blog - Some interesting background information about the patent situation Siri Source Forbes Image via CrunchBase The Apple (AAPL) iPhone features a flawed voice recognition service, Siri. And the patent battle that wiped out its inventor and his three decades of work is a great example of why America’ ...

Posted by SpeechTechnologyGroup at 2:44 PM No comments:
Email This BlogThis!Share to X Share to Facebook Share to Pinterest

Siri may arrive on iMacs next, according to Apple patent

Now Apple desktop customers can also chat with Siri and take advantage of it's voice recognition and text-to-speech capabilities

Source VentureBeat

A revised patent application from Apple published today suggests that the company’s voice command feature Siri is heading for the desktop, reports Patently Apple.

Siri first debuted last year as one of the most buzzworthy features included on Apple’s iPhone 4S. It goes beyond a simple voice recognition tool, providing iPhone (and iPad) owners with a way to search their contact information, weather, locations, and other utility information that normally required the use of several apps or a web search engine. Currently, the feature is only available on mobile devices. But the new patent suggests that it could end up on the next model of Apple’s all-in-one iMac PC.

The move would definitely make sense for Apple. The company hasn’t released an upgrade for iMacs since May 2011, and many speculated that Apple would make an announcement during its iPhone event earlier in the week. Plus, adding Siri to the iMac would make the product stand out as distinct from Apple’s other desktop PCs

Apple could be waiting to release a new iMac in late October, which is the same time it plans on releasing its new simplified, product-focused version of iTunes. Usually Apple releases its big iTunes updates to coincide with new versions of its iOS operating system or products, so it’s entirely possible.

The revised patent describes “Electronic Devices with Voice Command and Contextual Data Processing Capabilities” and specifically mentions that the functionality includes controlling a media platform as well as downloading, purchasing, and recommending songs.
via http://www.speechtechnologygroup.com/speech-blog - Now Apple desktop customers can also chat with Siri and take advantage of it's voice recognition and text-to-speech capabilities Source VentureBeat A revised patent application from Apple published today suggests that the company’s voice command feature Siri is heading for the desktop, reports Pat ...

Posted by SpeechTechnologyGroup at 1:55 PM No comments:
Email This BlogThis!Share to X Share to Facebook Share to Pinterest

Blog Archive

► 2013 (12)

► March (5)

► February (2)

► January (5)

▼ 2012 (305)

► December (11)

► November (16)

▼ October (15)

Speech Recognition for the food service industry

Google Introduces Enhanced Voice Search For iOS & ...

Speech Recognition for the Warehouse

Voice search shoot-out: Google’s new voice search ...

IVRs without Frustration: Speech Recognition Gets ...

Maccessibility » Apple looking to add individual p...

Google’s Search Lead Can Become Apple’s Advantage ...

((delay: in 8 hoursThe Biggest Threat To Apple Rig...

((delay: in 7 hoursApple giveth and Apple taketh a...

Why Speech is Key for a Great Mobile Customer Serv...

Futurist Ray Kurzweil Wants to Move Your Brain Int...

Google deploys ‘virtual brain’ for smarter searches

Changing Network Connections Silence Apple’s Siri

5 Reasons to Scrap Our Patent System: #1. Apple’s ...

Siri may arrive on iMacs next, according to Apple ...

► September (7)

► August (39)

► July (54)

► June (60)

► May (103)

About Me

SpeechTechnologyGroup

Speech Technology Group offers powerful 64-bit text-to-speech and speech recognition technology. We provide exceptional quality at an affordable license model with a passion for reliable and responsive support. The TTS can be deployed in cloud, server, desktop, embedded and mobile environments and comes with a broad variety of standard interface options. The voices are very natural and have a high degree of syntactical accuracy (correct pronounciation of proper names). The 64-bit speech engine can be deployed in server and cloud-based environments using the industry standard MRCP interface. This ASR/TTS engine is very efficient (up to 96 channels per server), highly accurate and supports large grammars and 26 languages. Both Technologies are used by thousands of companies of all sizes and are integrated with leading platforms like Asterisk, Avaya, Cisco and Genesys. Take advantage of our turnkey "compile and run" service to migrate your existing speech applications and infrastructure form your current ASR and TTS. You will save more than half of what you currently spend on core speech technology and finally have a partner that is truly committed to your success

View my complete profile