Microsoft is looking to take its next-generation offerings to a new level by making them capable of recognizing and understanding speech via its consolidated speech technology unit Tellme, reports Zdnet.
In the recent past, Microsoft has been publicly demonstrating how Windows Phones currently handle spoken queries. With the imminent arrival of Mango, Windows Phones will support even more speech functions including speech-to-text and text-to-speech. The Kinect Sensor is expected to get more sophisticated voice-command support this fall, allowing users to use Bing to search for movies, TV, music and other content via voice commands.
Ilya Bukshteyn, Tellme Senior Director of Sales and Marketing said, “ As compared to Windows 7, Windows 8 on ARM and Intel Slates will be able to recognize many speech commands, which makes sense considering that they will not be optimized for keyboard and mouse input. He elaborated, “Since, Windows 8 is HTML-based, the HTML5 speech tag could allow developers within Microsoft and outside to create applications for Windows 8 that are speech capable.”
The Tellme team is looking to push boundaries beyond speech recognition and venture into conversational understanding making scenarios even more interesting, Bukshteyn said. On Tuesday, at the SpeechTek Conference in New York, Microsoft explained how the ‘conversational understanding’ could work.
“Consider you want to meet a friend in New York for dinner next week. Microsoft officials think, in a couple of years, you will be able to say to your PC ‘arrange a dinner with Joe in Manhattan on Thursday.’ and Tellme will recognize the query, link to your Facebook account or LinkedIn social graph information to find out which ‘Joe’ you are likely to meet, compare your calendars and use Bing to search for restaurants you have both ‘Liked’ on Facebook.”
In a Tellme blog post on Tuesday, Microsoft explained what can be expected from the Bing/Tellme/ social-graph integration, “We see a future where the service will know you: your intent, your social and business connections, your likes and dislikes, your privacy preferences and things that define the context that is important to you. The result will be a speech NUI service that will help you accomplish everyday tasks in a more natural and conversational manner.”
The blog posted that Microsoft envisioned a future wherein today’s experiences can be built upon with Kinect for Xbox 360, Windows Phone or Bing for iPad and iPhone apps, by enhancing the speech NUI experience to understand more layers of context. Since, this is a cloud-based service, the users interactions will persist over time, enabling them to pick up where they left off, regardless of the device they may be using.
Bukshteyn said that the “understanding intent” or “conversational understanding” is an effort on Microsoft’s part to make Bing’s results more personalized. Because of the volume of speech data that it is collecting and using to improve the accuracy of its results, Tellme is playing a big role in these efforts. Currently, Tellme processes 11 billion ‘utterances’ every year.
The Tellme team also is planning to add support for the Tellme speech cloud to Windows Azure at some point, so that developers will be able to build and support IVR-enabled apps and services running on Azure.
Apart from this the team is also working on adding a speech programming interface to Windows Phone so that developers can write apps that take advantage of the speech technology built into the phone platform. However, Bukshteyn could not share a time frame as to when Windows Phone developers might get this API support.
At the SpeechTek Conference 2011, at New York, Microsoft released a video demonstrating how the speech recognition would work in future tablets. The demo includes a tile based user interface and shows a simple way of sharing pictures using touch and voice. This could also be a hint at what Microsoft is planning for Windows 8.
You can view the demo below: