Historically, building technology had been about capabilities and features. Engineers and product designers would come up with new things that they thought people wanted, figure out how to make them work and ship "new and improved" products. The result was often things that were maddeningly difficult to use.
That began to change when Don Norman published his classic, The Design of Everyday Things and introduced concepts like dominant design, affordances and natural mapping into industrial design. The book is largely seen as pioneering the user-centered design movement. Today, UX has become a thriving field.
Yet artificial intelligence poses new challenges. We speak or type into a computer interface and expect machines to respond appropriately. Often they do not. With the rising popularity of smart speakers like Amazon Alexa and Google Home, we have a dire need for clear principles for human-AI interactions. Two researchers at IBM have embarked on a journey to do just that.
The Science Of Conversations
Bob Moore first came across conversation analysis as an undergraduate in the late 1980s, became intensely interested and later earned a PhD based on his work in the field. The central problems are well known to anybody who has ever watched Seinfeld or Curb Your Enthusiasm, our conversations are riddled with complex, unwritten rules that aren't always obvious.
For example, every conversation has an unstated goal, whether it is just to pass the time, exchange information or to inspire an emotion. Yet our conversations are also shaped by context. For example, the unwritten rules would be different for a conversation between a pair of friends, a boss and subordinate, in a courtroom setting or in a doctor's office.
"What conversation analysis basically tries to reveal are the unwritten rules people follow, bend and break when engaging in conversations," Moore told me and he soon found that the tech industry was beginning to ask similar questions. So he took a position at Xerox PARC and then Yahoo! before landing at IBM in 2012.
As the company was working to integrate its Watson system with applications from other industries, he began to work with Raphael Arar, an award-winning visual designer and user experience expert. The two began to see that their interests were strangely intertwined and formed a partnership to design better conversations for machines.
Establishing The Rules Of Engagement
Typically, we use natural language interfaces, both voice and text, like a search box. We announce our intention to seek information by saying, "Hey Siri," or "Hey Alexa," followed by a simple query, like "where is the nearest Starbucks." This can be useful, especially when driving or walking down the street," but is also fairly limited, especially for more complex tasks.
What's far more interesting -- and potentially more useful -- is being able to use natural language interfaces in conjunction with other interfaces, like a screen. That's where the marriage of conversational analysis and user experience becomes important, because it will help us build conventions for more complex human-computer interactions.
"We wanted to come up with a clear set of principles for how the various aspects of the interface would relate to each other," Arar told me. "What happens in the conversation when someone clicks on a button to initiate an action?" What makes this so complex is that different conversations will necessarily have different contexts.
For example, when we search for a restaurant on our phone, should the screen bring up a map, information about pricing, pictures of food, user ratings or some combination? How should the rules change when we are looking for a doctor, a plumber or a travel destination?
Deriving Meaning Through Preserving Context
Another aspect of conversations is that they are highly dependent on context, which can shift and evolve over time. For example, if we ask someone for a restaurant nearby, it would be natural for them to ask a question to narrow down the options, such as "what kind of food are you looking for?" If we answer, "Mexican," we would expect that person to know we are still interested in restaurants, not, say, the Mexican economy or culture.
Another issue is that when we follow a particular logical chain, we often find some disqualifying factor. For instance, a doctor might be looking for a clinical trial for her patient, find one that looks promising but then see that that particular study is closed. Typically, she would have to retrace her steps to go back to find other options.
"A true conversational interface allows us to preserve context across the multiple turns in the interaction," Moore says. "If we're successful, the machine will be able to adapt to the user's level of competence, serving the expert efficiently but also walking the novice through the system, explaining itself as needed."
And that's the true potential of the ability to initiate more natural conversations with computers. Much like working with humans, the better we are able to communicate, the more value we can get out of our relationships.
Making The Interface Disappear
In the early days of web usability, there was a constant tension between user experience and design. Media designers were striving to be original. User experience engineers, on the other hand, were trying to build conventions. Putting a search box in the upper right hand corner of a web page might not be creative, but that's where users look to find it.
Yet eventually a productive partnership formed and today most websites seem fairly intuitive. We mostly know where things are supposed to be and can navigate things easily. The challenge now is to build that same type of experience for artificial intelligence, so that our relationships with the technology become more natural and more useful.
"Much like we started to do with user experience for conventional websites two decades ago, we want the user interface to disappear," Arar says. Because when we aren't wrestling with the interface and constantly having to repeat ourselves or figuring out how to rephrase our questions, we can make our interactions much more efficient and productive.
As Moore put it to me, "Much of the value of systems today is locked in the data and, as we add exabytes to that every year, the potential is truly enormous. However, our ability to derive value from that data is limited by the effectiveness of the user interface. The more we can make the interface become intelligent and largely disappear, the more value we will be able unlock."