Nuance just released a dictation tool (Dragon Dictation v1.0.1) for the iPhone. http://www.mobilecrunch.com/2009/12/16/nuance-updates-dragon-dictation-app-to-let-you-keep-your-contacts-secret/
Generic transcription of speech to text
Google Mobile App for the iPhone and Nuance are the only tools with favorable ratings (ie mostly 4s and 5s). You would expect Microsoft/Tellme to have an offering too, right? Strangely, as of Dec. 2009, they still don't have an app for the iPhone.
VoiceOnTheGo http://www.voiceonthego.com and Dial2Do (http://dial2do.com) are smaller players with similar offerings.
Keeping in touch with your friends and family (contacts ie)
Infinear http://www.infinear.com focuses on proper nouns (your father's name, mother's name, children's, friends', restaurants...) are EVEN more important to an average user than generic words. If I can put you in direct contact with a person whose name you can say, then there is no need for transcription. We store the entire audio on our servers. Every client (including your phone) can play back these messages for offline access. Hooking all this up via Yahoo Mail or GMail opens up the largest set of collaborative users on this planet-267 million yahoo users and 100 million GMail users ie.
Wednesday, December 16, 2009
Thursday, December 10, 2009
www.infinear.com An EC2 hosted speech driven solution
www.infinear.com is the official site of Sanjay Research's speech recognizer. It reads out websites, blogs and now Yahoo Mails too. You can listen to unread emails, rpely to specific ones and even compose emails addressing recipients by name. All handsfree and driven by your voice. Check out http://www.infinear.com for more details.
In future blogs, we will discuss the technology powering infinear.com and the process of hosting an Asterisk driven solution on Amazon EC2. Enjoy...
In future blogs, we will discuss the technology powering infinear.com and the process of hosting an Asterisk driven solution on Amazon EC2. Enjoy...
Saturday, August 22, 2009
August 2009 update on US handset sales
http://news.cnet.com/8301-1035_3-10309605-94.html?tag=mncol;mlt_related
The latest numbers support my claim from a few weeks back-the handset market in the US is bipolar too. RIM and Apple dominate the highend smartphone segment. The midrange market is ceding to the high end. The low end matters to price conscious consumers. Together, the article claims, 80% of the market by 2014 will belong to these 2 categories.
The billion $ unanswered question is.... of the 80%, how much is accounted by the high end? I will take a shot at this. AT&T just announced that data plans are mandatory for all smartphones. Clearly, AT&T is pushing iPhone apps tied to data plans. And is building out their network to accomodate future growth in this data space. But this segment (in spite of huge profits for Apple and AT&T) will not grow beyond 5% of the handset market. If AT&T unlocks iPhones and introduces competition for voice+data plans, this will grow to 25-40%. This will happen in 2012.
In between 2009 and 2012, the handset market belongs to the low end. Margins are high in the apps space-but theres so few of them, If user experience improves by orders of magnitude, the lowend can join the party!! And party on till 2014 and beyond....
Sanjay Research will focus on the low end with its voice offerings. And take a slice of the apps space hopefully.
The latest numbers support my claim from a few weeks back-the handset market in the US is bipolar too. RIM and Apple dominate the highend smartphone segment. The midrange market is ceding to the high end. The low end matters to price conscious consumers. Together, the article claims, 80% of the market by 2014 will belong to these 2 categories.
The billion $ unanswered question is.... of the 80%, how much is accounted by the high end? I will take a shot at this. AT&T just announced that data plans are mandatory for all smartphones. Clearly, AT&T is pushing iPhone apps tied to data plans. And is building out their network to accomodate future growth in this data space. But this segment (in spite of huge profits for Apple and AT&T) will not grow beyond 5% of the handset market. If AT&T unlocks iPhones and introduces competition for voice+data plans, this will grow to 25-40%. This will happen in 2012.
In between 2009 and 2012, the handset market belongs to the low end. Margins are high in the apps space-but theres so few of them, If user experience improves by orders of magnitude, the lowend can join the party!! And party on till 2014 and beyond....
Sanjay Research will focus on the low end with its voice offerings. And take a slice of the apps space hopefully.
Tuesday, August 11, 2009
Technology behind myMusik.us
Google's 411 service 1-800-goog-411 is a very interesting deployment of speech recognition. It is Google's entry into speech transcription and speech driven search. http://research.google.com/archive/goog411.pdf is a good description of its internals. The myMusik.us architecture shares a lot of similarities and I would like to compare and contrast the key attributes:
Proc. IEEE Trans. SAP, May 2000
[12] “OpenFst Library,” http://www.openfst.org.
myMusik.us uses a standard small-vocabulary recognizer, with MFCC features, GMM-based triphone HMMs, 1000 tied states and a normal Viterbi-based search.
- This is the entire description of the core pieces of the Google framework from the above paper: "The speech recognition engine is a standard, large-vocabulary recognizer,with PLP features and LDA, GMM-based triphone HMMs, decision trees, STC [11] and an FST-based search [12].
Proc. IEEE Trans. SAP, May 2000
[12] “OpenFst Library,” http://www.openfst.org.
myMusik.us uses a standard small-vocabulary recognizer, with MFCC features, GMM-based triphone HMMs, 1000 tied states and a normal Viterbi-based search.
- Google uses a training set that has > 1 million utterances
- Google's focus is a large vocabulary.
- http://research.google.com/roundtable/ has a nice video on a Manager's view of Google's speech recognizer. A common theme in the video is the fact that Google uses web query data in developing its language models. The relationship between words and the context in which a word occurs influences Google's language models. The larger the corpus, the better the models.
Saturday, August 8, 2009
Hey its mid-2009 and the mobile world looks like its finally entering the information age!! About time, eh?? You have read about the world going gaga over the Apple-iPhone, the Palm-Pre or the RIM-Blackberry. Thats 3% of the worldwide shipment of phones. Accounting for 35% of profit. Deutsche Bank's Brian Modoff has analyzed these trends to profitability quite well http://www.wikio.com/themes/Brian+Modoff
These figures focus on the manufacturers-what about the consumers? Where are they? What do they want? Where are they headed? Whats in it for Sanjay Research ? I see a definite bipolar worldwide market-a high end >$100 smartphone market mostly adopted by US, Europe,Japan,S.Korea and pockets of the MidEast. The rest of the world (especially the fastest growing one-India) will stay in the sub-$50 phone range http://trendsniff.com/2009/02/22/mobile-subscribers-china-india-2009/
These figures focus on the manufacturers-what about the consumers? Where are they? What do they want? Where are they headed? Whats in it for Sanjay Research ? I see a definite bipolar worldwide market-a high end >$100 smartphone market mostly adopted by US, Europe,Japan,S.Korea and pockets of the MidEast. The rest of the world (especially the fastest growing one-India) will stay in the sub-$50 phone range http://trendsniff.com/2009/02/22/mobile-subscribers-china-india-2009/
Subscribe to:
Posts (Atom)