I agree with everything you said here — wholesale, no exceptions. I’ve considered all of that, and I’ve explained all of it across my (many) articles on the subject. So, let me respond with my thoughts regarding each of your points

“people use smartphones for something to do and to look at i.e. they like having a screen to look at. At home it may be quicker to ask an Amazon Echo for some info when you need it in a hurry — but most people use their phones for tasks they don’t need to do, like looking at videos on YouTube or Twitch, or pics on Instagram.”

Absolutely! That’s the connotation of “voice-first” as opposed to “voice-only.” Even in today’s touch/text regime, you have alternative mediums for interacting with media. The voice interface epoch merely inverts that order of preference — it doesn’t entirely eliminate the visual element or its proponents. It suggests that the majority of our interactions either already are or already can be denominated in voice (“end-to-end audio”), which is the most frictionless transmission mechanism available to us today.

You’ll probably still have a TV, a smartphone, a laptop, a tablet, an Echo, etc. But, once that inversion happens (the “tipping-point”), I ask, what’s really the value of a premium smartphone or smartwatch, considering that these devices are relegated to supporting roles? During the voice-first transition, subordinate devices are both a backend (for storage/compute that discreet earbuds can’t sustain on their own) and a backup (for displaying visual information that voice can’t overcome).

Perhaps I should be more clear on the notion of a “strong-form” voice epoch: The rest of the device ecosystem won’t disappear any more than PCs have in the mobile era; they’ll just be subject to price disinflation.

“people aren’t comfortable using voice commands in public for personal tasks”

I discussed that in my original article. I also think of that every time I see this:

A newspaper is the devil’s picturebook

I think of that every time I see this too:

Earbuds are the devil’s orchestra

Finally, I think of this every time I see two people (or a group) walking down the street together, having a conversation aloud.

Catch my drift?

“…I also believe that smartphones will be eventually supplanted, not by voice, but by AR glasses.”

Completely agree, and I said as much myself. We’re not ready yet though. Voice is the next big thing today because we’re equipped for it. It’s ergonomic and unobtrusive. AR doesn’t have a hardware/delivery solution yet, so it’s not ergonomic nor unobtrusive. (e.g. Consumers have repeatedly shot-down glasses as the transmission mechanism.) The successor to Moore’s Law will deliver a worth AR form factor eventually — maybe contact lenses — but not today.

Be the signal, not the noise.

Highlights by you and for you. On all the blogs you want to read. Annotote is the new social. #LeaveYourMark

You read a lot, and now you can show everyone what you know, with Annotote.



“Perfection is achieved not when there is nothing more to add, but when there is nothing left to take away...” 👉

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store