Big Data in K-12: Is Voice Technology Talking to You Yet? (Part II)

Education Technology’s “Next Big Thing?”

My last column, “Big Data in K-12: Attack of the Recommendation Engines,” explored the big ideas and market drivers behind the surge of entrepreneurial initiatives seeing advances in big data technology as key to empowering intelligent real-time differentiated instruction. While not immediately obvious, voice recognition technology, which has advanced considerably alongside data science, relies on specialized recommendation engines to navigate the speech-to-text hurdle and then derive meaning and reasoning. We’re far closer to speech as a universal computer interface than many realize and examples of educational products are beginning to surface. Last spring Stanford University’s Prof. Patrick Suppes threw down a gauntlet for the ed tech industry by identifying machine recognition of “emotional speech” as the final challenge to realizing educational technology’s vast potential. Read on to learn more about where we stand and some interesting examples of voice-powered educational products. (These articles are based in part on my recent “View From the Catbird Seatpresentation at EdNET 2012.)

Speech as Computer Interface

We’re increasingly accustomed to call center voice recognition systems deployed for customer service. It’s not yet like talking to a live agent, but many of them are robust enough to help much of the time, leaving connection to live agents for special needs and backup when the caller bails from the machine. My brother Stuart, a commercial lawyer specializing in leasing, has been using voice recognition software to dictate and edit business documents for years (most recently he’s using Nuance CommunicationsDragon NaturallySpeaking 12). What you may not realize is that today’s PC operating systems have “core voice recognition technology (VRT)” baked in (e.g., Microsoft’s Windows 7 and Apple’s OS X Mountain Lion). Public recognition of VRT’s increasing power was reinforced in February 2011 when IBM’s Watson computer beat expert human competitors on TV’s popular Jeopardy game show. In a subsequent article on the feat, Watson team leader, David Ferrucci, conjectured a future educational scenario where Watson could be assigned to read a book and then “discuss” it with a student or young adult who has read the same book.

For anyone still not aware of VRT’s advances, Apple’s Siri, launched in October 2011 with the iPhone 4s, blasted it into the public imagination. The technology underlying both Mountain Lion and Siri is licensed from Nuance Communications, a VRT pioneer. Put nicely by Wikipedia, quoting Steve Wildstrom, “Siri is an application that combines speech recognition with advanced natural language processing. The artificial intelligence, which required both advances in the underlying algorithms and leaps in processing power both on mobile devices and the servers that share the workload, allows software to understand not just words but the intentions behind them.” According to Scott Traylor, Chief Kid, 360Kid, Siri was born out of an artificial intelligence project started in 2003 by the U.S. Government. Funded by DARPA as part of its Personal Assistant That Learns project and called the Cognitive Assistant That Learns and Organizes (CALO), it remains to date the most ambitious artificial intelligence project in the U.S. Government’s history. These underpinnings of Siri’s technology were developed at Stanford University and SRI International (formerly the Stanford Research Institute) and later acquired by Nuance. Some futurists see speech becoming a dominant computer interface in the next few years.

Voice Recognition Growing Up

Scott Traylor reminded me of one of the earliest uses of VRT in a K-12 product, back in 2000, in the Soliloquy Reading Assistant, a “guided reading” system which “listened” to the student reading a story and identified mispronunciations and other errors. (The product survives as Scientific Learning Reading Assistant™.) Traylor explained, “The technology could not interpret children’s voice patterns very accurately, but even without voice models, it could tell you where you were within a reading passage IF AND ONLY IF the technology knew what it was you were trying to read. So if it had the entire text of The Cat in the Hat in memory, the technology could tell where the student was in the story and if there were reading errors.”

One of VRT’s advances since then has been more sophisticated voice models using the digital signatures of spoken sounds and pattern recognition technology. Oversimplifying the process, the first step is to convert speech to text and, second, to interpret the text. The science behind this is sometimes called natural language processing or computational linguistics. Two strategies are used to improve accuracy—identifying the speaker as having one of a number of known speech patterns that I like to think of as accents, and using context for interpretation. As explained to me by Owen Lawlor, Director, Strategic Technology, Victory Productions, the logic behind both stages, called an “ontology” by data scientists, is sometimes represented by massive probability tables and concept maps tuned to the subject of discourse.

Richard “Rick” Mack, VP Communications, Nuance, told me that for Nuance products, this is accomplished by their “natural language framework,” which is repurposed for different industries and applications, essentially serving as the meaning and reasoning engine. The “framework” accepts spoken, typed, swiped, and scanned document input. This is what powers medical transcription systems and is licensed to Ford for dashboard systems control (e.g., there are expected to be about 4 million Fords and Lincolns on the road equipped with MyFordTouch by the end of this year, a number expected to reach 13 million by 2015) and TV voice control in Samsung and LG TVs. Mack added that IBM and Nuance have a new partnership with the “Watson” team to develop meaning and reasoning engines for health care. It’s the framework’s pre-populated probability tables that, like a human misunderstanding a spoken statement, can lead to Siri’s sometimes goofy responses as featured at stuffsirisaid.com. A sign that VRT is in its ascendance is that Nuance is building a Natural Language Understanding (NLU) research facility in Silicon Valley (Sunnyvale-Mountain View), which will eventually employ 100 to 200 researchers.

Carnegie Mellon University’s Language Technologies Institute (LTI) is another leading research center for VRT with a wide range of projects under way. Carolyn Rosé, CMU Associate Professor at LTI and the Human-Computer Interaction Institute (HCII), told me that current projects related to K-12 applications in her research group include automated essay grading technology, conversational agents that moderate student online discussions, technology for analyzing individual student contributions to collaborative projects, and conversational agent technology to draw out reflection using directed lines of reasoning.

CMU’s LTI has at least one successful commercial spinoff, Carnegie Speech, whose tag line is “The Intelligent Language Learning Company,” which closed a $3.4 million Series B fund last May. It offers three product lines—Carnegie Speech Assessment (English language proficiency), NativeAccent (English speech training and foreign accent correction), and Climb Level 4 (Aviation English proficiency). The Aviation English product responds to the imminent International Civil Aviation Organization (ICAO) deadline for international pilots and controllers to demonstrate ICAO Level 4 English proficiency. Does Carnegie Speech technology work? According to the firm’s website, NativeAccent improved pronunciation scores up to 100%, decreased speech training time and costs up to 50%, and delivered up to a 10X return on speech training investments.

Another VRT firm to watch is Kuato Studios, a game-based learning company founded by Frank Meehan, former board director for Spotify and Siri, together with SRI and Horizons Ventures, premier VC investors in Facebook, Spotify, Waze, Siri, and Fixmo. You can get a glimpse of what a Siri-like K-12 iPad app for teaching computer coding looks like at the Kuato Studios website.

As mentioned earlier, Stanford University’s Patrick Suppes, when receiving an SIIA lifetime achievement award at last spring’s Ed Tech Industry Summit, challenged our industry to offer VRT interfaces capable of understanding “emotional speech.” He explained that even infants can sense from speech cues when a parent is angry or happy and interpret their spoken messages accordingly. Nuance’s Mack told me his firm is working on suitable technology that will allow, for example, voice cues from a caller to an automated call center to be used to refer an angry or impatient caller to a more appropriate point in the program, such as to a live agent. Other related research using facial cues for discerning emotion has taken place at Stanford University and The University of Texas at Dallas. Considering advances in facial recognition for IT security and other applications, using facial cues this way may not be that far-fetched.


Thanks to commercial, political, and military R&D, voice recognition technology has come a long way in the past few years. Computer-based natural language processing has benefited from advances in the underlying artificial intelligence algorithms, many driven by big data technology, and from leaps in processing power—both on mobile devices and the servers that share the workload. With synergies to the “semantic web,” which envisions a searchable web that understands not just words but the intentions behind them, and the growing use of mobile technology in education, it’s clear that VRT will be a force for changing value propositions. For now the challenge is to find the right niches where it contributes the most. Language acquisition is certainly one of the low hanging fruit. What are the others in your sandbox? The concluding piece of this series, which will appear in December, shares reader feedback giving further examples of innovative products and services and reviews the hurdles and business implications for education resource providers seeking to harness these promising developments in big data and VRT.

(Note: For more on big data’s implications for K-12, check out AEP’s upcoming “Big Data Leadership Day - How to Go Boldly into the New Frontier,” CEO Roundtable in New York City, Wednesday, November 28.)
Dr. Nelson Heller is President of The HellerResults Group, a global strategic consultancy serving business and non-profits seeking growth opportunities in the education market. He is the founder of The Heller Reports newsletters and EdNET: The Educational Networking Conference, both started in 1989. The EdNET News Alert, successor to The Heller Reports publications and now published by MDR, reaches over 31,000 education executives worldwide every week and features a regular column from The HellerResults Group each month. You can learn more about Nelson and his industry leadership at The HellerResults Group. If you need strategic insight, business partners, international connections, stronger boards, keynoters, or entrepreneurial savvy and want the insight of 30 years at the business and technology crossroads of the education market, you can reach Nelson at 858-720-1914, by email at nelson@hellerresults.com, and on Twitter @NelsonHeller.