Wednesday, August 8, 2012

NoteSwift Launches AllScripts MyWay

Speech recognition in the medical practice …

NoteSwift LLC has unveiled NoteSwift for AllScripts MyWay electronic health record (EHR) solution, which works with Dragon Medical Practice Edition from Nuance to take speech recognition to the next level of efficiency. The solution will save physicians up to 30 minutes per day, according to the company.

NoteSwift expands the power of Dragon Medical by enabling an entire patient visit to be captured in AllScripts MyWay from start to finish using speech recognition. Advanced technology enables a continuous dictation style that expedites the documentation process by minimizing pointing and clicking, and eliminating typing for most patient visits.

Features include:

  • Structured data entry: NoteSwift enables information such as medications and allergies to be captured into the EHR via dictation as searchable, structured data.
  • Seamless navigation: NoteSwift enables fast, seamless progression through records by requiring minimal navigation phrases and virtually no manual intervention.
  • Eliminates costly customization: NoteSwift makes MyWay “workflow friendly,” eliminating the need for the extensive customization often required with such products.

“EHRs improve many aspects of a medical practice, but can cause frustration for physicians because they increase the time spent documenting patient visits due to all the pointing, clicking, and typing that’s required,” Chris Russell, M.D., founder of NoteSwift, said in a statement. “Dragon has helped automate the input, but only in a batch mode. NoteSwift significantly improves the input process and allows physicians to spend more time with patients by enabling them to use a familiar, traditional dictation style to input patient information into structured fields.”

via http://www.speechtechnologygroup.com/speech-blog - Speech recognition in the medical practice … NoteSwift LLC has unveiled NoteSwift for AllScripts MyWay electronic health record (EHR) solution, which works with Dragon Medical Practice Edition from Nuance to take speech recognition to the next level of efficiency. The solution will save physicians u ...

Tuesday, August 7, 2012

Spansion’s Speech Recognition Coprocessor: Flash Memory with On-Board Search-Logic Power

A Hardware approach to improve speech recognition performance.…

Spansion is a name that’s probably familiar to many of you, as a supplier of nonvolatile memories. You might be wondering, therefore, what the company’s doing gracing the pages of InsideDSP. Well, hold that thought! Spansion was originally founded in 1993 as a joint venture of AMD and Fujitsu, and named FASL (Fujitsu AMD Semiconductor Limited). AMD took over full control of FASL in 2003, renamed it Spansion LLC in 2004 and spun it out as a standalone corporate entity at the end of 2005.

Spansion’s product line is predominantly derived from a NOR flash memory foundation, in both serial and parallel interface device options, as well as including a comparatively limited number of NAND flash memories. NOR flash memory’s rapid random read speeds make it ideal for fast data fetch and direct code execution applications. Conversely, NOR tends to not be as dense (therefore cost-effective on a per-bit basis) as NAND on a comparable process technology, nor does it keep pace with NAND’s write speeds. Both factors hamper NOR’s ability to compete against NAND in bulk code and data storage applications. And Spansion also has plenty of NOR flash memory competitors in traditional application and customer strongholds.

Hence, the company is striving to diversify and differentiate itself, with a series of logic-enhanced devices that exploit NOR flash memory’s strengths while not being unduly hampered by its versus-NAND shortcomings. Spansion calls them Human Machine Interface Coprocessors, and the first one in the family, the Acoustic Coprocessor, targets speech recognition applications. Spansion’s product materials also sometimes incorrectly reference “voice recognition”; the two terms refer to different applications. The Acoustic Coprocessor focuses on the translation of spoken words (speech) into text; it’s not intended to uniquely authenticate or verify the identity of a speaker.

In defining the Acoustic Coprocessor via a partnership with Nuance, a well known voice technology developer, Spansion was guided by the observation that in conventional speech recognition algorithms, a substantial percentage of the total processing time is spent simply comparing each incoming phoneme (fundamental digitized speech “building block”) against a database of phonemes, striving to identify a closest match (Figure 1):

Figure 1. Upwards of 50% of the total time taken by a speech recognition algorithm can, according to Spansion and Nuance, be spent doing phoneme matching

Successive identified phonemes are then strung together by the voice recognition algorithm and transformed into words, using probabilistic calculations and other matching techniques. The Acoustic Coprocessor, as its name implies, is intended to offload the phoneme-matching portion of the algorithm from the CPU or DSP (Figure 2):

Figure 2. The Acoustic Coprocessor tackles the voice phoneme-matching function, thereby relieving the system CPU or DSP from this task

Think of the Acoustic Coprocessor as an application-optimized nonvolatile CAM (content-addressable memory). RAM-based CAMs have commonly found use in networking applications, as Wikipedia explains by case study example:

When a network switch receives a data frame from one of its ports, it updates an internal table with the frame’s source MAC address and the port it was received on. It then looks up the destination MAC address in the table to determine what port the frame needs to be forwarded to, and sends it out on that port.

The conventional software-based approach to implementing port forwarding employs a microprocessor, the flexibility of which is overkill for such a focused task, and which is also slower and more power-hungry than the CAM approach. Expand beyond the elementary switch to a more complex router, and the CAM’s superiority over a software-based implementation becomes even more apparent.

Similar CAM-like advantages apply to the Acoustic Coprocessor in speech recognition applications. The product combines the capabilities of a fast-access RAM and a nonvolatile phoneme storage device; the consequent two-memory function integration saves board space, power consumption and (potentially) cost. The phonemes are stored locally, versus being “cloud”-based, which enables speech recognition to continue to work even in settings where network data access is sketchy or nonexistent. And the Acoustic Coprocessor’s phoneme-matching response time, Spansion claims, is up to 50% faster than that which a conventional software-based approach can cost-effectively deliver.

The higher inherent performance of the Acoustic Coprocessor, Spansion and Nuance assert, can also be leveraged to improve voice recognition accuracy (Figure 3). The higher access speed of Spansion’s product enables larger phoneme tables, which could be used to enable more robust speech recognition by compensating for ambient noise and other factors that can confuse a more elementary implementation. In addition, larger phoneme tables can be used to enable recognition of multiple languages and dialects.

Figure 3. The Acoustic Coprocessor’s faster phoneme search-and-match speeds allow for larger phoneme tables (top graphic), which support incremental languages and dialects, along with user-specific voice recognition capabilities and ambient noise compensation (bottom graphic)

The larger phoneme database also allows for user customization, enabling the system to (for example) respond to the driver’s voice command to change the radio station while ignoring voices coming from vehicle passengers or from the radio itself, a desirable “focusing” feature that can also be enabled via directional microphones.

The 65 nm Spansion Acoustic Coprocessor is nearing tape-out, with initial sampling scheduled for the third quarter of this year and initial production slated for early next year. It will come in various flash memory capacities and also embeds “massively parallel” function-specialized search logic circuitry, with a 768-bit wide, 1.2 GByte/sec bus interconnecting the nonvolatile memory and logic blocks. With power consumption estimated at between 100 mW and 1.5 W depending on capacity and function, it’s not yet necessarily ready for battery-operated applications, but Spansion is already working with several large automotive manufacturers.

Spansion’s next-generation Acoustic Coprocessor aspirations encompass devices that offload from the CPU additional portions of various audio-processing algorithms. And Spansion also plans to expand beyond speech recognition into coprocessors for diverse embedded vision-based applications (Figure 4).

Figure 4. Speech recognition is Spansion’s initial Human Machine Interface Coprocessor focus, but vision-based applications such as face recognition, emotion discernment, gesture interfaces and advanced driver assistance are also on the company’s radar

via http://www.speechtechnologygroup.com/speech-blog - A Hardware approach to improve speech recognition performance.… Spansion is a name that’s probably familiar to many of you, as a supplier of nonvolatile memories. You might be wondering, therefore, what the company’s doing gracing the pages of InsideDSP . Well, hold that thought! Spansion was origin ...

Nuance’s Nina Brings Siri-Like Voice Recognition Features To Mobile Apps

The next wave of speech technology is voice biometrics. Now it is included with Nuance's new virtual assistant Nima.
nina_logo_nuance

Nuance, the company that powers a large number of tools that use voice recognition (including Apple’s Siri) launched its own Siri-like voice-powered “virtual assistant” today that developers can add to their mobile apps. The Nuance Interactive Natural Assistant (Nina) uses the company’s speech recognition technologies and combines them with voice biometrics and an understanding of natural language and the user’s intent to “deliver an interactive user experience that not only understands what is said, but also can identify who is saying it.” At its core, Nina is something akin to a “Siri for apps” and iPhone and Android developers can now start integrating it into their own apps.

The service consists of three different parts: the Nina Virtual Assistant Persona, an SDK and the Nina Virtual Assistant Cloud that lives on Nuance’s servers and which powers most of the service’s features. Nina currently understands US, UK and Australian English and Nuance promises to launch support for additional languages later this year.

Instead of having to find their ways through lots of menus, Nina aims to help users perform a task with just a few spoken sentences and without the need to learn an app’s specific vocabulary.

Nuance argues that “the time is now for virtual assistants” and that “the proliferation of voice-enabled assistants is making consumers comfortable asking a device, rather than a person, for information in a very intelligent, conversational way.” While Nuance never quite comes out and says so, the reason why users now feel more comfortable with using these kinds of voice-enabled tools is obviously the success of Apple’s Siri.

Besides the virtual assistant features that aim to make using mobile apps easier and faster, Nina also includes support for Nuance’s VocalPassword tool. VocalPassword uses biometrics to authenticate a user without the need for passwords or PINs. Users simply speak a passphrase and the app will recognize the user.

Nuance’s first partner for this service is the USAA financial services group. The two companies are launching a pilot of USAA’s mobile app with Nina this month and plan to launch a Nina-powered app for all USAA members early next year.

It’s hard to say how well this service really works without testing it in the real world. We’ll keep an eye out for Nina-powered apps, though, and will report back once we get a chance to spend some quality time with this new service.


via http://www.speechtechnologygroup.com/speech-blog - The next wave of speech technology is voice biometrics. Now it is included with Nuance's new virtual assistant Nima. Nuance , the company that powers a large number of tools that use voice recognition (including Apple’s Siri) launched  its own Siri-like voice-powered “virtual assistant” today that d ...

Sunday, August 5, 2012

Speak to type - DOMESTI-TECH

Speech recognition for dictation on the new Apple operating system Mountain Lion - how does it work…

The first talking and typing programs were pretty frustrating, a lot of corrections and not much output.  Things got better over time, programs like Dragon naturally speaking changed the way that they translated the spoken word and other programs came around to use more processing power to deliver better speech recognition.  Recently with the advent of Apple’s Mountain Lion operating system and Google’s speech recognition service, dictating to your computer has become much easier.

I decided to give this new technology, or at least recently updated technology, a try so this article is all dictation done with Apple’s software built into the Mountain Lion operating system.   My biggest difficulty so far is that I can’t think of anything to say.

Writing an article by talking instead of typing is not my normal mode of creation. I like to  think about what I’m typing and a lot of writing, frankly, happens in rewriting.  However, so far I’ve only had to make two corrections.  This could really start to grow on me.

How they do it

Anyone who’s been watching the tech news lately will know that some people are up in arms about Apple’s privacy policies concerning the use of it’s vocal assistant Siri.  While certainly not perfect, Siri has some major advantages.  Using Siri in the car is one of the greatest inventions of all time in my estimation.  I can ask for a local phone number as I drive along,  I can ask how far we are from our destination, I can even check the weather, all with my hands never leaving the wheel or my concentration from the road.

The newest Apple speech recognition built into mountain lion isn’t like Siri, your machine does all the translating using Apple’s software licensed from Nuance (makers of Dragon naturally speaking).  The reason it’s so accurate is  due to better interpretation technology.  The reason Siri works so well is that your voice is uploaded on the fly to a huge data center for transcription.  Bigger machines mean better translation of the spoken word.  Google’s speech to text works the same way.

For privacy advocates, that represents a huge issue.  Everything you say is being forwarded to the Apple or Google data centers, worked on by large powerful machines and then stored for future reference so that it becomes more accurate the more it’s used.  This means Apple  has a record of every word I’ve spoken to Siri.  I’m not sure I’ve ever related any embarrassing information to my phone but would be interesting check those records myself.  Until I lost consciousness due to boredom that is.

Another big difference between these talk-to-text systems and those tried in the past is that words do not appear on the screen while you’re talking.  I always found that incredibly distracting.  I used to spend most of my time thinking about the mistakes being made instead of what I was trying to write/say.  On Mountain Lion I can go for about 30 seconds before the speech buffer is full, or I can stop anytime by pressing the return key.     Both the Google system and the Apple system wait until you are finished before trying to translate what you said.

The real deal

Both Apple and google have spent years perfecting this technology.  Apple has had voice assist to control the mac since the 90’s.  Google search using voice came out about 2 years ago.  The technology has now matured to the point that’s it’s now really convenient, and almost hassle-free.

Although we all expected this moment to come long ago, it’s still nice that it has finally arrived.  Speech recognition is available to everyone for little or no cost.  Google doesn’t offer a full voice input for all it’s applications yet, but that is sure to come.  Apple’s implementation is system wide.  Every application that accepts text input can use the speech recognition built into the operating system.  It’s also available on iPhones and iPads.  Google offers the service on it’s Nexus phone all you have to say is “Google” and that starts the service.  Most android phones (and iPhones) have the Google voice app as well.  For a start, this means less danger from texting while driving which is a very welcome change.  It also means that people who are disabled or just aren’t very good typists will now be able to compose essays, books, letters or just emails to their heart’s content.

This is a good thing.


via http://www.speechtechnologygroup.com/speech-blog - Speech recognition for dictation on the new Apple operating system Mountain Lion - how does it work… The first talking and typing programs were pretty frustrating, a lot of corrections and not much output.  Things got better over time, programs like Dragon naturally speaking changed the way that the ...

Friday, August 3, 2012

Siri criticism analysis: Apple’s Siri is the future, like it or not

Speech recognition as a user interface is here to stay…

by Zach Epstein

Hate Siri all you want – it’s still the future

By: | Aug 2nd, 2012 at 11:20AM

I recently penned a quick piece on Apple’s (AAPLSiri and Google’s (GOOG) new voice command support in Android 4.1 Jelly Bean, explaining that while there are undoubtedly issues to iron out, services like these are going to change the way we interact with devices. But people are still fed up with Siri, and rightfully so says David Pogue in the latest issue of Scientific American. “We’re used to consumer technology that works every time: e-mail, GPS, digital cameras,” he wrote. “Dictation technology that relies on cellular Internet, though, only sort of works. And that can be jarring to encounter in this day and age.”

Pogue goes on to say that while we have grown accustomed to things that just work, people are not seeing the forest for the trees. These technologies are only just emerging, and we shouldn’t “throw the Siri out with the bathwater.” The virtual assistant portion of Siri works very well and the dictation portion will get there soon enough.

“Free-form cellular dictation is a not-there-yet technology,” Pogue notes. “But as an interface for controlling our electronics, it makes the future of speech every bit as bright as Siri promised a year ago.”



via http://www.speechtechnologygroup.com/speech-blog - Speech recognition as a user interface is here to stay… by Zach Epstein Hate Siri all you want – it’s still the future By: Zach Epstein | Aug 2nd, 2012 at 11:20AM I recently penned a quick piece on Apple’s ( AAPL )  Siri and Google’s ( GOOG ) new voice command support in Android 4.1 Jelly Bean , exp ...

Thursday, August 2, 2012

Real-Time Phone Translation Gets Real

Bringing down the language barrier in our global community will help people to expand beyond their borders more than ever before. And speech recognition and text-to-speech technology can help with that..…

Real-time translation over telephone connections — one person speaking in English and another in Spanish and both hearing immediate translations, for instance — clearly has great potential for businesses and consumers. Carriers also will benefit, since traffic would grow if people who speak different languages suddenly could communicate. Lexifone CEO and Chief Scientist Ike Sagie tells IT Business Edge blogger Carl Weinschenk that the secret is marrying speech recognition and telecommunications technologies.

 

Weinschenk: What is Lexifone?
Sagie: Lexifone, very simply put, is the first and probably only fully automatic phone interpreter. When people pick up a phone — any phone — and use any carrier they will have the option of having the call translated in real time.

 

The 50,000-foot view is that we are the first to combine two different technologies into one: telecommunications and speech recognition. The speech recognition technology is to some extent the same as on smartphones. We use the telecommunications technology of conference calls, VoIP and other technologies that usually do not include speech recognition. We combine very advanced telephony technology with voice recognition technology.

 

Weinschenk: Is the challenge to doing this squeezing all the functionality into the device?
Sagie: We do not package anything on any phone. We do this as a service. The way you access Lexifone is by dialing a telephone number as you would 411 or any other service. You can do it with a regular phone, a smartphone, Skype or any other type of phone. You access our service via the telephone number provided locally all over the world. We have a large cloud-based service that is very, very powerful. It does all the processing. It all is done on our very big servers. Nothing is being processed in your phone.

 

You do not need Internet access or Wi-Fi or any other Internet connection. This is extremely important if you are traveling abroad. People are very sensitive when they are traveling abroad because roaming is very expensive.

 

Weinschenk: So the amount of computing necessary to perform these tasks makes it impossible to do on a device.
Sagie: There is too much number crunching to give you any high level of accuracy … [Existing services are not useful in real time scenarios] because the telecommunications functionality is detached from the application processing. This combination is very important.

 

Weinschenk: What do you mean by “combining” the elements?
Sagie: Combining means that they work in the same circuitry. It is the same loop for the entire process, from when the person dials a number. We act as the operator. We pick up the call, process it, do the speech recognition and complete the cycle by connecting to the other party and establishing kind of a three-way conference call with the caller and the person called, with Lexifone in the middle.

 

This combination is unique to us to the best of my knowledge. It has been in development for three years. This level of seamless integration between telephone and voice recognition is unique. The entire concept is unique.

 

Weinschenk: What is the overall state of speech recognition?
Sagie: It’s in constant progress. Some breakthroughs in speech recognition have been made in the last year or so. The state of the art is moving very nicely. Today recognition is at a very high level of accuracy. The state of the art is advancing and we will see, say five years from now, very high-recognition accuracy levels. We already are in the domain of over 85 percent. The advances can get us to the 95 or 98 percent point. We no longer are in the era of very low accuracy.

 

Weinschenk: What you are doing — and perhaps products from competitors and other types of speech recognition tools — seems to be a great opportunity for carriers.
Sagie: The bottom line is that we can get billions of minutes of air time per year for the product. This is a very tempting business opportunity for operators worldwide.

 

The reason is very simple. Operators today have more or less saturated the air time. People talk as much as they want. The challenge to increasing revenues is adding value and premium content. You don’t have new reasons to use the phone. Lexifone opens up a completely untapped reservoir, a repository of air time. You now will be able to talk to people who do not speak your language. Until now it did not occur to people to just pick up the phone and talk to a colleague who speaks Chinese and just talk to them. In the past, email and other tools were the means of communicating. Now they will start using air time. The potential is huge for operators.

 

Weinschenk: Lexifone appears aimed at both consumers and business users.
Sagie: We appeal to both with same level of excitement. For businesses the advantages more or less go without saying. It will benefit large enterprises, government themselves, hospitals and the entire travel industry — any organization that encounters language barriers. And it’s not just for expats. In the U.S., for instance, there is a need for understanding Spanish by people who don’t speak it.

 

For consumers, the way to use Lexifone is to register like you would for the Skype out service. … the way you register is to go onto the site. You prepay $10 or $50 or what you can subscribe for a monthly charge. Once your phone is registered you can use the number wherever you are. You call local Lexifone number and you are directed to dial the number you want to call and then you are set. We are working on a version for smartphones, which features automatic dialing from the device’s phone book.

 

We have 15 languages and growing. We will add a new language on average every two months. The service also distinguishes, for instance, between American English, Australian English and English English. It recognizes Castilian Spanish and Mexican Spanish and U.S. Spanish and other language variants.

 

Weinschenk: Do you support translations in conference calls — when more than two languages are being spoken?
Sagie: We do now to some extent. We also want to support multilingual, multi-language conference calls. Today during a conference call it is very simple. You create the conference call yourself and then dial up Lexifone as another party to the call. As soon as it is on it listens to all participants. Participants simply announce today which language they are speaking.

 

We are in contact with a number of carriers. I can’t disclose what we are discussing or make an announcement. The service today requires people to dial in to Lexifone to make a call and then dial the call. If the service is offered— say by Verizon — they would not need to dial any number. They would dial to the destination; if you want to translate, they give you the means to do it.

 

Weinschenk: It sounds like the science and business of speech recognition is moving ahead quickly.
Sagie: The entire field is progressing and even accelerating the progress. I predict that in the next five to 10 years we will get to almost 100 percent accurate speech translation.

via http://www.speechtechnologygroup.com/speech-blog - Bringing down the language barrier in our global community will help people to expand beyond their borders more than ever before. And speech recognition and text-to-speech technology can help with that..… Real-time translation over telephone connections — one person speaking in English and another i ...

Wednesday, August 1, 2012

Microsoft Partnership Focuses on Voice Recognition

More thoughts about the ICSI/Microsoft partnership…


 

We probably all remember the humorous scene in Star Trek IV: The Voyage Home where Scotty attempts to give voice commands to a twentieth-century computer, even trying to use the mouse as a microphone. Or what about the much creepier human-computer interactions with HAL in 2001: A Space Odyssey? Clearly, the dream of voice-operated computer systems has been with us for a long time. With a new partnership announced this week between Microsoft and the International Computer Science Institute (ICSI), that dream is perhaps coming closer to reality.

This announcement comes at a time when voice recognition technologies are already becoming more prevalent. Who hasn’t dialed into a phone system that asked for voice input rather than (or in addition to) key presses? And I’m sure just about everyone has seen Apple’s iPhone commercials with various celebrities putting Siri to the test, even if they haven’t used Siri on an iPhone personally. However, these voice implementations are far from what we see in HAL or the Enterprise.

The reason the current implementations fall short comes down to the concept of prosody, and it is in this linguistic area that the ICSI/Microsoft partnership will begin its research efforts. ICSI is an independent computer science research institute, with affiliations to the University of California at Berkeley. Although ICSI studies a range of computer technologies, director Roberto Pieraccini’s background is in speech technology; in fact, he’s recently published The Voice in the Machine: Building Computers That Understand Speech (MIT Press), which examines the history of computers and voice technology stretching back six decades.

“We get the benefit of working with the world-class people at Microsoft, but also get to work on real problems,” Pieraccini said of the partnership. “It’s very important for us to work on real problems, on real data, which we don’t have and Microsoft has.” As far as what this partnership could achieve, Pieraccini said, “Eventually, we would like to have speech substituting keyboards and mice. We would like to be able to give commands and to interact with machines, not only at the consumer level like we do with Siri or Google Voice Search, but also at the level of doing more important things.”

I don’t use an iPhone, and if I did, I doubt that I’d use Siri. I’ve always felt it acts more like a novelty — best for a little humor rather than getting anything significant done. Even on my Android smartphone, I use the voice capabilities rarely. I’ve used voice-to-text for hands-free email responses while driving once or twice. When I’ve used Google Voice Search, it’s pretty hit-or-miss whether I get what I intended right away. And in any case, these aren’t the sort of tasks a Microsoft Exchange Server administrator, for instance, is greatly concerned with.

So, we come back to prosody. “Speech conveys much more than just the words,” said Andreas Stolcke, principal scientist with the Conversational Systems Lab (CSL) at Microsoft, and a key member of this partnership. “Things like the emotional and maybe even physical space of the speaker, the nuances of meaning that would be ambiguous if you didn’t have the actual intonation and the timing of what is being said. This is a group of phenomena that linguists call prosody.” As a music-lover (and wannabe musician), I like to think of prosody as the natural music of language.

Current voice technologies are based on decoding speech into literal transcriptions of the words, then turning those words into commands. But how does the computer tell if you’re making a statement or asking a question? Or how does it deal with sarcasm (of which I’m all too often guilty)? “Our speech interfaces right now ignore this type of information,” Stolcke said. “One of the big goals of our collaboration is to look into ways of extracting prosodic information and other ‘beyond the word’ information about speaker state and so forth, and make that available to computers that people interact with.”

This notion of improving spoken communication with computers fits well with Microsoft Research’s focus on Natural User Interface, a project which led to the gesture-based Kinnect for Windows. Elizabeth Shriberg is another principal scientist at Microsoft’s CSL involved with the ICSI/Microsoft partnership. “One of the big challenges that we’re actually focusing on is to develop a common framework to a number of these types of capabilities,” Shriberg said, “where prosodic cues are used to do something — some task. We’ve started doing this already; it’s been implemented in a prototype in a lab at Microsoft.”

This is one of the most exciting aspects of this project: Although carried out in a lab environment, it’s clear that the intent is to find real-world applications for the technology. Shriberg said, “We don’t want to be the type of researchers, or this should not be the type of project, where it’s sort of Ivory Tower and it stays out there forever. We took problems where we know there’s a need, we know that the systems right now don’t perform perfectly, and we said, hey, prosody could probably help on this particular problem.”

The researchers couldn’t specify anything about the prototype they’re currently working with, nor could they predict when the research would result in something that would go in a marketable product — nonetheless, it’s good to know that is their aim. It’s not hard to imagine the ways a truly intelligent voice technology could be used for IT management. Exchange Server 2013 Preview has simplified the management systems into one web-based console, the Exchange Administration Center (EAC) — but wouldn’t giving voice commands be even easier than point-and-click?

So watch out, Siri; watch out, Google Voice Search — or better yet, step it up! True voice management of computer systems could be all that much closer due to this ICSI/Microsoft partnership. Now I’d like to go to the coffee machine and give it a command such as, “Tea, Earl Grey, hot,” to get a nice beverage — but instead I’ll be forced to punch buttons like some kind of sucker. Oh well.


via http://www.speechtechnologygroup.com/speech-blog - More thoughts about the ICSI/Microsoft partnership…   We probably all remember the humorous scene in Star Trek IV: The Voyage Home where Scotty attempts to give voice commands to a twentieth-century computer, even trying to use the mouse as a microphone. Or what about the much creepier human-compute ...