If you listen closely, you can hear the quiet anticipation of Amazon’s Alexa, Google’s Assistant and other mainstream voice assistants and ecosystems becoming HIPAA-compliant, which will undoubtedly release a floodgate of innovation into the sector. Even with concerns about privacy and data security that will linger far beyond reaching this regulatory hurdle, voice-first technology’s intersection with modern healthcare is poised to explode.
Interestingly, even in the absence of HIPAA compliance, there is a remarkable amount of investment, experimentation and exploration of the voice tech space by healthcare companies, large and small, right now. Leading organizations like Mayo Clinic, which created the award-winning “First Aid” Alexa skill, sit alongside startups like Orbita, LifePod and many others that have already jumped into this exciting, growing area.
Dr. Matt Cybulsky, Ph.D., and I produced something we call the “VHI – The Voice of Healthcare Index.” This is the percentage of queries (out of 300 total, which we asked to each voice assistant) that Alexa, Google Assistant, Siri, Cortana, Bixby and Hound were able to recognize.
These queries were broken down into six major categories: illegal drugs, legal drugs, medical centers, healthcare companies, isolation and symptoms/pathology. Each query was asked in a consistent fashion, allowing for multiple attempts if the voice assistant did not understand the first time. And each query was constructed simply, to mimic how these voice assistants are most likely to be called upon. “Alexa, tell me all the side effects of Abilify,” is far less likely to be asked than, “Alexa, what is Abilify?” and our queries within the informal study are direct and generally test a broad surface area of understanding of the voice assistants.
In completing this study, and even in the process of conceiving such an effort, we learned a lot about the state of readiness of these various voice assistants to assist in the noble task of improving modern healthcare.
Here are the results.
Voice of Healthcare Index (VHI) Scores:
Hound (SoundHound, Inc.) – 83/100
Google Assistant – 83/100
Cortana (Microsoft) – 70/100
Alexa (Amazon) – 65/100
Siri (Apple) – 53/100
Bixby (Samsung) – 38/100
Google Assistant and Hound, which is SoundHound, Inc’s voice assistant, tied with a VHI of 83 (out of 100). Hound answered slightly more queries correctly (250 vs. Google Assistant’s 248), but if we had measured for quality of response, Google Assistant generally would’ve had Hound beat across all six categories.
In the example below, when we asked Hound, “How much does it cost to go to the hospital?” — a query all the voice assistants struggled with — Hound was the only one to summon Uber prices to show you how much it would cost to, well, get you there. Not the intent of the question, but still, a valid response, and one that shows Hound understood the query, even with a different end interpretation.
Google Assistant has the company’s entire reservoir of data to fall back on in interpreting queries, in healthcare or related to any other topic. Hound’s strong performance in this study speaks to the company’s top-shelf natural language understanding (NLU) capabilities, which are among the reasons the company was able to raise $100 million to pursue growth.
Google Assistant’s weakest area was illegal drugs, where it understood 33 out of our 50 queries and had good responses for far fewer. Hound, on the other hand, had a profound weakness with our battery of prescription drug queries.
Even with the similar VHI scores, there is little doubt Google Assistant is the voice assistant that is “most ready,” right this second, for healthcare use. The voice assistant has a remarkable range of curated voice responses — an area where Alexa also shines — but simply is more consistent in being able to interpret healthcare-related queries and respond in a way that helps.
Alexa understood 196 out of the 300 queries, for a VHI of 65 (out of 100). This score was fourth of the six voice assistants in the study, ranking Alexa above Siri and Bixby, but behind Hound, Google Assistant and Cortana.
Alexa understood the most queries within the category of “Healthcare Firms” (40 out of 50) and the least queries in the area of medical centers (22 out of 50). Overall, however, Alexa’s performance across all six areas of medical terminology was fairly consistent — no huge gaps in knowledge.
Amazon’s DNA as a retailer marred the experience of asking Alexa about healthcare topics. One example of this was with illegal drugs. Telling non-Amazon voice assistants, “I need marijuana” or “I need opioids” or “I need methadone” (three of the top abused drugs in the United States), resulted with information about getting help for drug addiction. However, when asking the same questions to Alexa, these three drugs were added to a grocery list for later purchase. Fortunately, for other related queries, Alexa directed users to an addiction hotline or took a more helpful, relevant action.
The strength of Alexa is the sheer number of people working on improving it as a voice assistant — rumored to be in the tens of thousands — which manifests in the richness of some of Alexa’s responses to medical queries. Siri would often recognize queries but then present poor information in response. Quality of information is important, and a strength of the Alexa ecosystem relative to the other main voice assistants. There is little doubt that Amazon has the ability to shore up Alexa’s deficiencies unearthed by this study in a small amount of time, if they choose to do so.
Microsoft’s voice assistant, Cortana, has carved out a niche for itself. Microsoft has partnered with Amazon to integrate Cortana and Alexa in a seamless way, and Microsoft CEO Satya Nadella has said he sees Cortana itself as an Alexa skill, albeit one with significant horsepower.
What’s interesting about our findings on Cortana, is that the technology outperformed Alexa in four of the six healthcare categories (healthcare firms, legal prescriptions, medical centers, and symptoms). Only with illegal drugs and isolation did Alexa surpass Cortana.
Generally, Cortana performed well and performed consistently within a healthcare context, relative to its peers.
Siri, Apple’s voice assistant, and Bixby, Samsung’s voice assistant, both need significant improvement and are certainly in a state of less readiness for healthcare applications than their competitors.
Siri, which had a VHI of 53, did beat Bixby, which had a VHI of 38, but it was our sentiment that Bixby’s quality of answers, when it recognized the query, was superior to Siri. Siri not only suffers from poor sources of information and poor uses of those sources of information, but it also curiously undergoes intermittent outages of service (which we experienced more than once while completing this study) that will lead users to believe Siri doesn’t understand a query, with no notice of any sort of cloud-based problem or outage.
Siri also put up the single worst category score of any voice assistant, registering an understanding of only five terms (out of 50) within our symptoms/pathology category. In fairness, Bixby wasn’t far behind, registering only six (out of 50) terms within the prescription drug category, and then only 10 (out of 50) in the same symptoms/pathology category in which Siri had five.
Worse, with responses that Siri did actually understand (such as, “How do I know if I drink too much?”), Siri provided responses attempting to be humorous but that actually obscured Siri’s functionality.
Bixby, which was announced in 2017, is just now receiving a lot of attention and resources from Samsung, which has opened up Bixby to third-party development, a move that will certainly help Bixby take needed steps of improvement. Not only has Apple not done the same with Siri, but Apple has owned Siri since 2011. The good news is that Apple has considerable resources at its disposal, if it decides to enhance Siri via acquisitions or to ramp up hiring to the level which Amazon is hiring for Alexa-related work.
It is surprisingly difficult to determine how best to assess the “healthcare readiness” of a voice assistant. If a voice assistant says, “Sorry,” when you tell it, “you need illegal drugs,” was that sympathetic? Or simply not understanding what you said? There’s an interpretation hurdle to overcome that we saw in several queries we presented to the six voice assistants.
Separately, these voice assistants uniformly need a better way to divorce one’s location from one’s questions about healthcare. All of the voice assistants generally did poorly with questions related to best hospitals, worst hospitals, and best hospitals for particular areas of need (e.g. heart disease, cancer) and learning more about them. Instead, several assistants tried to use location information in ways that didn’t serve the user or add value to answering the question.
These are early days, and in some ways the results shown in this study are quite impressive, even on the lower end of the spectrum. But there is quite a bit of ways to go, and hopefully this study will help these voice assistants get there faster.
Bradley Metrock is CEO of Score Publishing, which produces the series of VoiceFirst Events, including the upcoming Voice of Healthcare Summit, Aug. 5-6 at Harvard Medical School’s Martin Conference Center in Boston. Mass. Dr. Matt Cybulsky is principal of Ionia, a healthcare consulting firm, and co-founder of AlphaVoice, a voice-first tech development firm. Both Metrock and Cybulsky co-host The Voice of Healthcare Podcast.
Send this to a friend