Why the voice interface is inevitable and its new paradigms will be defined by those like Annotote
Voice assistants’ weaknesses are end-to-end audio’s strengths — and there’s a huge opportunity to invent the user experience for this brave new world
Regarding your first point, “Natural Language,” machine learning and natural language processing are making huge strides. The subsets that you’re talking about specifically are AI for contextual and semantic processing, which have achieved the level of intelligence you’re asking for, but privacy is a major inhibitor of consumer implementation. For example, among your search history, calendar, email, and GPS tracking, Google has access to enough data to provide a high fidelity semantic voice assistant, but both corporate controls and regulatory limitations restrict the commingling of all that data they have on you.
In addition, I talked about the current, underwhelming state of natural language processing in my article, because it’s not a problem for the voice interface I discussed. In fact, end-to-end audio actually makes underwhelming NLP a value proposition:
“[With respect to] natural language processing accuracy: If inputs remain audio recordings by default, they’ll sidestep the problem of text transcription errors entirely… by avoiding text transcription altogether […] I chose my words carefully above: “voice as the primary medium.” Primary doesn’t mean exclusive […] When we invert users’ habits by prioritizing audio over text, text will still be there as a backup when necessary […] If developers start featuring audio as a default, they will accelerate user adoption and the formation of new habits — especially since the value proposition (reduced friction) is so strong for the majority of use cases.”
In other words, believe it or not, your example of ‘calling Lisa Green’ is not in “the majority of use cases” for digital consumers. Sending Lisa Green an audio message would be a more representative parallel. Furthermore, an audio message has the added benefit of reducing the dependency on NPL/ML/transcription (compared to sending Lisa a text-based email or message). If all else fails, you still have a physical, visual device as a backup for making touch-based selections, but the point is that end-to-end audio massively reduces friction across all use cases, in aggregate.
Finally, regarding your second point, “Reading,” I completely agree. The voice interface will need new paradigms to accommodate common consumer behaviors like skimming, browsing, pausing, and re-reading. Some of these habits have already started changing (a lot of people are getting acclimated to the voice medium thanks to podcasts and audiobooks); others will adapt or evolve (I mentioned “the formation of new habits”); and a few require a radical rethinking of the user experience.
In that vein, Annotote is at the forefront of developing user experiences for modern media and the next wave(s). I read the same way you do, with the same commute and the same interruptions. That’s why Annotote's network provides summaries to make it easier to read everything you’d read anyway. So, when you’re on the train or walking to work, you can get straight to the point. If you want, you can still enjoy the whole article — you can even add highlights or notes — but Annotote gives you all signal/no noise so you don’t start by wasting your time. (And you better believe we’re ready to facilitate voice too, because you’re right, it would be “super complicated” for the way we’re accustomed 😉)
Check it out and preregister for our public alpha while it’s still open: