- This is the entire description of the core pieces of the Google framework from the above paper: "The speech recognition engine is a standard, large-vocabulary recognizer,with PLP features and LDA, GMM-based triphone HMMs, decision trees, STC  and an FST-based search .
Proc. IEEE Trans. SAP, May 2000
 “OpenFst Library,” http://www.openfst.org.
myMusik.us uses a standard small-vocabulary recognizer, with MFCC features, GMM-based triphone HMMs, 1000 tied states and a normal Viterbi-based search.
- Google uses a training set that has > 1 million utterances
- Google's focus is a large vocabulary.
- http://research.google.com/roundtable/ has a nice video on a Manager's view of Google's speech recognizer. A common theme in the video is the fact that Google uses web query data in developing its language models. The relationship between words and the context in which a word occurs influences Google's language models. The larger the corpus, the better the models.