Earlier today I read a piece in The Atlantic entitled The Quest For the Next Human-Computer Interface, subtitled “What will come after the touch screen?”.
I’ve been interested in human-computer interfaces since the very early Eighties when I first came across the work of Niklaus Wirth, Seymour Papert and Jef Raskin. For me human-computer interfaces are split in two. The first is the interface to _build_ software and the second is to _control_ software. Wirth worked mainly on the former, Raskin on the latter and Papert in both areas, principally from work in learning.
The Atlantic article is, of course, mainly concerned with the latter. How do people control the software on their computing device, how do they enter data and how do they get results.
It also starts from a broken premise, that there will be a “next” interface. Next implies there was a previous interface and that it has now been replaced. This couldn’t be further from the truth. It was only the most primitive of computers that predated the use of a keyboard and printer, two interfaces still going strong more than sixty years later. Speech recognition was usable for serious work as far back as the early 1980’s. Touch screens date from the same time. Virtual reality and augmented reality work, including work on using gestures, also began around then.
Let’s have a look at my favourite interface, the keyboard. You might think that not much has changed but just think about spelling correction and predictive text. If you’re a programmer using a good editor then you can even have fairly good (and improving) context sensitive predictive text – the editor knows when you are typing a variable name and only predicts those one moment then on the next line realises you are calling a function and predicts on those. How about an editor that “knows” when you import a bunch of functions and adds those to the list to predict on?
Even better, in Google Wave Peter Norvik demonstrated context sensitive spelling correction. His example was the system capable of correcting “icland is an icland” to “Iceland is an island”. He also demonstrated the system correcting a number of homonyms such as “Are they’re parents going two the coast?” corrected to “Are their parents going to the coast?”
So while the physical keyboard has not improved (indeed keyboard junkies like me feel it has gone backwards) the intelligence of the keyboard has improved and improved the interface.
How about that voice technology?
First, let’s dismiss one of the statement’s in the Atlantic article. Missy Cummings (head of Duke University’s Robotics Lab) says “Of course, the problem with that is voice-recognition systems are still not good enough. I’m not sure voice recognition systems ever will get to the place where they’re going to recognize context. And context is the art of conversation.”
I’m going to break that down. Voice-recognition is actually two problems. The first is translating the noise of a voice into a text stream. The second is understanding the text stream so that our software can act upon the request. In good systems the second informs the first, but they are different problems. So when Cummings talks about recognizing context she is talking about the second problem.
For all intents and purposes the first problem has been solved. Translating the noise of your voice to a text stream is becoming more reliable, less upset by your accent and faster by the day. Siri, for example, does this superbly.
So it is the second problem where improvements still occur. This is the field of study called “natural language processing”. The problem Cummings is talking about is partly discourse analysis, text linguistics and topic segmentation. All of these sub-fields have continued to progress. Indeed progress has been amazing for natural language processing within what researchers call “limited domains”. This is where the general topic of a conversation (or discourse) is limited to a specific area.
An example might be a search of a movie database.
“Show me all Cameron Diaz’s movies.”
“I’ve got 32 movies.”
“OK, how about just her comedies?”
“Here are the six movies starring Cameron Diaz marked as comedies.”
That is a conversation which uses context. A tiny example but the computer has to understand the meaning of “her” from the context of the conversation. The next time you talk “her” might be Judi Dench or Cate Blanchett. Now this is limited in domain and the context is easy but it *is* recognizing context. So research continues on understanding more complex examples of context and across a wider domain. Siri, the Amazon Echo and their ilk are improving constantly.
We have also seen constant improvements in touch interfaces. Both the hardware, with touch sensitive capacitive touch screens with excellent resolution replacing earlier capacitive screens, and interface software where tap, tap and hold, hard tap and hold and swipe all recognised with different meanings (and often different meanings in different contexts). Touch screen software is even getting good at recognising the difference between your finger or a pen and your hand accidently brushing the screen.
So what will the next human-computer interface be? Mostly the old ones with improved software, hardware and interface design.