With the recent news from Amazon, that Alexa can be used for HIPAA compliant skills, the opportunities are broader than simple healthcare education or query and can include two-way asynchronous communication and data collection.

Voice is possibly the most natural user interface and a natural fit in healthcare, where a conversation is the most prevalent form of patient/provider conversation. Voice interfaces have the potential to help clarify communications, enable patients to self-serve, and deliver empathy in healthcare settings.

With the recent news from Amazon, that Alexa can be used for HIPAA compliant skills, the opportunities are broader than simple healthcare education or query and can include two-way asynchronous communication and data collection.

Applications for voice interfaces aimed at patients both in in-patient and outpatient settings include:

  • Educational content delivery
  • Scheduling appointments
  • Patient self-discovery and triage
  • Collecting patient-generated health data

As well there are great use cases in pharmaceutical and life sciences, for clinical trial check-ins, medication adherence, and education. For health insurers, applications might include looking up benefits or other coverage information, learning about wellness, or finding a provider.

Voice is a natural interface in healthcare as it can mimic the types of interviews and interactions that people have as part of an in-person visit. However, with voice interfaces users have an expectation of being able to explain things the way they would to a human and may get frustrated if the assistant can’t understand things like medical terms or medication names. This will improve as the underlying technology is used with medical-specific dictionaries, but medication names may always prove challenging. Thinking about the end user and the task they need to accomplish: “I took my medication” rather than accuracy: “I took exactly this dosage of this brand name, also known by this name” can solve some of these challenges.


As with any new technology, there is often a learning curve or an experience curve in figuring out what the best use cases are. Often when something is newly introduced it is viewed as the solution for all the clumsy interfaces of the past, and when it doesn’t solve every problem, early adopters are sometimes jaded. By spending some time thinking about use cases, scenarios, and user goals you can design experiences which truly highlight the new technology and delight the users.

Voice interfaces have the potential to help clarify communications, enable patients to self-serve, and deliver empathy in healthcare settings.

Considering that people can only remember five things at a time, possibly the most obvious use is for quick-hit types of interactions, such as reminders or daily tips on one’s health or condition management, rather than explaining complex ideas or complicated medical terms. Most healthcare experiences are based on a conversation, so it’s a natural way for a brief exchange of information, without the necessity of a log-on or searching a screen for a simple answer to a question. And people find the voice interaction friendlier: in testing we did with people with Type 2 diabetes, patients told us that “it seemed like she cares” which highlights the ability to design empathetic interfaces.

Guided interactions are also a great interface for voice, where you walk someone through a simple interaction that may have different paths depending on responses. However, consider that these interfaces need to be adaptive. People expect to be able to tell the voice assistant anything, they don’t expect it to behave like a phone tree or a robocall. Imagine wanting to check a symptom, and having the conversation start with “listen carefully because our options have changed.” The user needs to be in control not the application. The user wants to tell you their symptoms and get some suggestions, not walk through a list that is hard to remember.

These guided interactions need to be semi-structured to adapt to the interface. Take, for example, the KOOS survey that is a standardized outcome survey that is often used to measure patient progress for value-based bundles for knee replacement. Patients often do this survey online, and it’s required to be delivered a certain way. This survey asks for responses from five choices, which is hard for people to remember.

Here’s an example:

  1. How severe is your knee stiffness after first wakening in the morning?
    None  Mild      Moderate     Severe     Extreme

If this survey is delivered via voice, the person has to remember all the options and the question in their head to be able to answer. A voice-first approach might be:

Voice Assistant: Is your knee stiff when you wake up in the morning?

Person: Yes

Voice Assistant How stiff is your knee?

Person: Really stiff. I can’t bend it at all.


This is more tuned to how we speak, but is this actually the same survey? Does “really stiff” equate to “severe” or “extreme?” How is the additional information recorded? You can see that this conversation is more realistic and closely related to user goals than the survey, but how do you record and normalize this feedback?

It’s often better to get some information and to get the conversation flowing to engage patients than to strive for total accuracy, but it’s worth considering that the way you collect information from people will impact the information you collect.

In the previous example, you can see that unlike delivering a survey through web or mobile, you need to be prepared for answers that might not be what you expected. However, at the same time, there’s an opportunity to launch into deeper branches of investigation: In addition to answering simple questions and delivering simple, straightforward information, artificial intelligence offers the possibility of responsive care by changing the interaction based on patient feedback. A question or concern about a symptom can launch a deeper inquiry about the problem, provide potential next steps, or trigger an alert for the care team. These potentials also suggest that a voice-activated device could be useful as a first step and provide basic triage skills, giving a patient guidance on how to proceed and delivering information in advance of a doctor or hospital visit.

With support for HIPAA-eligible skills from Amazon, we can only expect that Google Home will also soon support these scenarios. That said, concerns about privacy abound. One of the first concerns is identifying who is speaking. Voice PINs are an easy way to password protect a skill healthcare skill for a specific patient, and both Amazon and Google are working on voice recognition. Consider that these smart speakers are usually in the home, a trusted environment where people may overhear your phone conversations as well, and you can see this is less of a big deal.

Other privacy concerns come up around the actual speech utterances and how they are protected by the creators of the skills and the technology vendors. If protected health information (PHI) is going to be delivered from a patient to a covered entity, all participants in that transfer will be required to follow HIPAA regulations and will be required to sign a business associate agreement (BAA). This means that the data is encrypted at rest and in transit, and that both technical, physical, and process safeguards are used to protect the information. Recent revelations that humans listen to the voice utterances to improve the accuracy of the systems may have been surprising to some, but any type of machine-learned interaction is first trained by human classification. This may happen in a HIPAA-covered scenario as well, but a number of safeguards would need to be implemented, like de-identifying the data, using trained personnel, and not outsourcing this activity.


While there are many considerations and very high expectations for voice-enabled care plans, based on the feedback we’ve seen from people using beta versions of voice care plans, they hold great promise for patient engagement, adherence, and ultimately better outcomes. Based on the ubiquity in the home, their appeal across demographics from young to old, and their cost-effectiveness, they have can have a broad impact in healthcare.

As you consider the right interfaces and interactions for using voice in your healthcare organization, the most important criteria is this:

Can you delight the user and inspire higher levels of healthcare engagement?

About the Author

Anne Weiler is CEO and co-founder of Wellpepper, a platform developer for interactive treatment plans that help patients engage through mobile, web, and voice interfaces.