How does voice work? – Pedro
Everyday we talk, whisper, and sing (🛀🎵🧼). We effortless share our thoughts with those around us, or sometimes a huge audience. Voice is an amazing symphony of coordinated movements, but we take it all for granted. It requires no conscious effort, but behind the scenes our brains orchestrate many different organs to generate each syllable, word, and sentence. First we produce the base sound – using our diaphragms to squeeze air from our lungs, past the larynx, vocal folds, velum, tongue, and lips. When you hear your friend’s voice and recognize them immediately it’s partly thanks to the sound waves bouncing around in their nose. A bit gross, perhaps, but the vibrations through the nasal cavity contribute to the unique character of each of our voices.
Sounds are voiced (from vocal fold vibrations) or unvoiced (everything else), and then we apply a “filtering” process – moving our tongue, palate and lips to make the “f” and “ss” sounds (known as fricatives) and the “p” and “t” sounds (known as plosives). In English, we combine it all into 44 phonemes - the individual sounds we combine to build the words and sentences with which we communicate (1).
This doesn’t even touch on all of the changes and variation in different accents in English alone!
The power of voice through the centuries – Kate
Recently, I was listening to a podcast while driving my kids to school. The show was so fantastical and incredibly creepy that I was compelled to finish the episode even after my kids were out of the car. Scientists in London have made an Egyptian mummy “speak” again (2). They recreated the mummy’s vocal apparatus using CT images and a 3D printer to bring the voice of priest Nesyamun, aptly named “True of Voice,” back to life. Now, if you listen to this story and are expecting the priest to chant or speak in full sentences (as I was), you may be a little disappointed. They just made one sound. One. A long, vowel-like intonation, but when played over and over again, as it was in this kids’ podcast, it was mesmerizing. When I got to my desk I was still thinking about this story, and why this one sound had such an impact on me when I hear thousands of them everyday.
Our product PredictionHealth is built on voice. The software listens to patient-doctor conversations and uses AI-assisted scribes to create the clinical documentation (almost as cool and futuristic as making a mummy talk with a 3D printer). Our voices and ears also help us take care of our patients. We covered ears in our recent article All Things Audio, but vocals are the other side of the communication coin. The story of Nesyamun’s resurrected vocal cords reminded me of how important voice can be in healthcare. Communicating makes us human, and to hear someone’s voice across the centuries and beyond their earthly life is almost spiritual. Technology – imaging and computer modeling – has allowed us to bring someone’s voice back to life after thousands of years. Can we use this technology to make healthcare better – to bring more life to our years?
Voice technology to the rescue! – Pedro
PredictionHealth’s mission is to build AI that understands what’s happening during each patient-clinician encounter to make healthcare more effective and efficient. This goes beyond interpreting phonemes – it’s like listening between the sound waves 😉. A conversation is an intricate dance with most meaning hidden in the tone, volume, and pauses that can convey uncertainty, fear, and caring. Like an experienced clinician, interpreting these nuances is key to understanding health. Speech analysis can even help determine depression and suicide risk (3,4) among other important health outcomes. Our future AI models will interpret what is said, how it’s being said, and whatever implications there are for the information’s accuracy and comprehensiveness.
Machines have become surprisingly good at interpreting images as seen in this study:
But the dense content of speech (sound waves sampled tens of thousands of times per second) are spread across time and influenced by the size of a room, the materials of the walls, the direction a person is facing when speaking, the technology capturing the sound, and even the compressing algorithms used to condense and send that data for processing.
If we could visualize this, here’s an idea from some of our own audio research.
Simple spoken phrases turned into images (spectrograms):
And here’s what a conversation "looks" like:
Understanding and harnessing voice is complex, but an untiring, well-trained, AI-helper is just what the doctor ordered – making life easier by taking care of the drudgery of documentation, and eventually providing gentle reminders or tracking down useful information just-in-time.
For over 200 years clinicians have walked into the patient exam room with a stethoscope so they could more easily and accurately listen to the patient’s heart. Soon every physician will walk in with a new tool – an AI-scribe – freeing you to focus on your patients again.
Get in touch with us to learn more about PredictionHealth. We’d love to hear from you.
Pedro Teixeira, MD, PhD and Kate Celauro, MD