/* ---- Google Analytics Code Below */

Saturday, April 14, 2007

Wired for Speech

I just read Wired for Speech: How Voice Activates and Advances the Human-Computer Relationship , a book by Stanford prof Clifford Nass and Scott Brave. The Amazon link above lets you preview parts of the book. Nass was co-author with Byron Reeves of the book The Media Equation. In many ways Wired for Speech is a continuation and specialization of The Media Equation. Its a statement about how how our brain's recognition capabilities for speech are very specifically wired, and how a knowledge of that fundamental wiring is important to building speech-driven interfaces. Worth reading for people building these interfaces.

It also makes the point, emphasized in The Media Equation, that humans interpret synthesized speech as though it was delivered in a human to human interaction, even when it is obvious it is synthesized, and they are repeatedly reminded of that fact. In other words, many of the same rules for human speech interaction apply for interaction between human and machine. This is startling stuff, and well worth understanding. Nass and Brave establish evidence for their conclusions with experiments with groups of volunteers, usually less than a hundred, who I am assuming are Stanford students. Hardly representative of real consumers, and I wonder some about statistical significance. A number of these experiments are described in the book.

Nass was also involved in an experiment to deliver a sophisticated car navigation system for BMW. Car navigation assistance is a good example of this kind of voice interaction. Other excellent examples are systems that deliver consultation over the phone. Many of their conclusions rang true as I thought about when thinking about my own experiences with systems support at P&G.

He makes the point:
" ... there is a sad irony: voice interfaces can seem very smart without knowing anything. The core problem is that humans have simplistic rules for assessing performance and these rules can be leveraged to make things seem smarter than they are. Reminding people that they depend on the interface for their success automatically makes the computer seem more intelligent. Attractive faces and voices are perceived as better able to perform tasks. And as noted earlier, labeling a part of the interface as a specialist, conforming to gender stereotypes, flattering the user, or matching the user's personality, also increase perceived competence. Indeed, people are so susceptible to manipulation that perceived intelligence is a very weak predictor of actual intelligence (it explains less than half of the variance) ..." (p. 152)
He spends much time on gender stereotyping in systems. Making the point that the wiring for detecting and responding to the gender of a voice occurs at a very young age. So do you use this finding to improve your voice driven interaction system, or do you attempt to re-engineer people from the top down? Heady stuff.

Good book if you are designing or re-designing a voice based system. Or if you are interested in the linking of people and machines.

No comments: