Vocal Biomarkers for Mental Fitness Scoring and Tracking

 

Fitness Tracking for the Mind

Easy tools for tracking our steps, sleep, heart rate, and diet have become part of our daily lives. We intuitively understand that following these dimensions of physical health are important. Easy access to continuous objective measures of these physical health inputs and outputs has been an important factor in catalyzing the ongoing and powerful shift toward maintaining physical fitness to reducing the likelihood of a number of chronic diseases. Tracking and improving these measures is at the heart of the value proposition for many powerful new digital products that are being used by millions to help live more active and physically healthy lives.


Wouldn’t it be amazing if we could also track aspects of our mental health just as easily as we track our steps? Until now, opportunities to develop and promote similar shifts in mental health toward maintaining mental fitness have been hampered by the lack of simple and widely available tools to detect and track what is happening “above the neck” objectively over time. If we define physical fitness as a global measure of our body’s ability to function efficiently and effectively, resist disease and deal with (and recover from) emergency situations, then mental fitness can be similarly defined as a measure of our mind’s ability to function and feel your best, to resist decreasing mood and motivation under stress and fatigue, and to recover quickly after experiencing sadness or anxiety.


Sonde has developed a new voice analysis technology with the ability to effortlessly detect and track symptoms and physiological changes that are associated with depression, anxiety, stress, and burnout. We are working internally and with partners to build powerful digital products to transform the way we understand and manage our mental health and to build the foundation for a shift to maintaining mental fitness to reduce the likelihood of depression and anxiety and address care management challenges for the rapidly growing numbers of people across the world suffering from anxiety, depression, and related conditions.


The Sonde Health Mental Fitness App


Tracking mental fitness using voice is based on the notion that changes in mental fitness impact our bodies, not only our minds. For example, clinically depressed people experience decreases in heart rate, depth of breathing, and serotonin levels. Depression also impacts characteristics of a person’s voice, both in the words we say and in how we speak and sound. From a linguistic perspective, depressed speakers may use more negative words than non-depressed peers. Yet, acoustic profiles—the way our voice sounds—also differ between depressed and non-depressed individuals, regardless of what words are being spoken. These differences in the acoustic profiles of speakers with high and low mental fitness can be measured using vocal biomarkers


Sonde Health’s Mental Fitness smartphone app (shown in Figure 1) uses cutting-edge technology to detect vocal biomarkers that are indicative of a person’s mental fitness. Users simply speak into the microphone on a smart device for 30 seconds. Sonde’s software analyzes the person’s voice in real time to provide immediate evaluation of their mental fitness biomarkers. As smartphones are estimated to account for 50% of all mobile devices used by 5 billion people worldwide (and >80% in Western countries), Sonde’s technology offers an accessible, private, non-invasive, and cost-effective way to evaluate mental fitness. Furthermore, Sonde makes it easy to be proactive about health and wellness for both individuals and organizations.

Figure 1. The Sonde Health mental fitness app provides speech features from voice and allows users to track their trends over time.


Mental Fitness Tracking Value Propositions


Individual User Value Propositions

Insight into health and wellness is empowering. Learning about your health can guide you to seek support from familial, social and professional supports, and healthcare professionals whenever you need it. Sonde’s mental fitness app provides a new way to objectively measure mental fitness in a way that seamlessly fits into busy daily routines and habits with complete privacy. It provides an innovative early warning detection tool for symptoms mental health conditions like depression. The Sonde app also enables self-reporting of moods and offers suggestions on how to help increase mental fitness by providing links to articles about the benefits of meditation, exercise, and more. Monitoring mental fitness over time through daily use of Sonde’s app becomes a pro-active self-management approach—forging a new frontier in personalized health and fitness tracking. 


“Doing my health check in the morning has become part of my daily routine; I’m excited that I’m able to participate in evaluating a new technology that is giving me a different way to handle my problems with anxiety” — Sonde User


Employer and Educational Institution Value Propositions


The COVID-19 pandemic has further validated the role of the employer in delivering healthcare and more specifically mental health care services for their employees. Offering the most comprehensive healthcare benefits will allow employers to differentiate themselves in the competition for talent recruiting and retention. Utilization of Employee Assistance Programs deployed across employer sites continues to be at very low utilization relative to the prevalence of depression rates across employees. Sonde’s mental fitness scoring API can be integrated into any EAP or wellness platform to increase employee engagement and utilization of mental health services. Furthermore, employee productivity studies show that employee absenteeism increases by up to 51% for employees experiencing depression. Given a post pandemic depression prevalence of 27.5%, huge strides can be made to employee productivity and retention by supporting them with a solution that offers objective mental fitness monitoring.


Clinical Value Propositions


Sonde’s mental fitness app offers smartphone-based rapid evaluations that detect and monitor symptoms broadly consistent with clinical assessments. The Cognitive Behavior Institute (CBI), a group private practice in Pennsylvania offering therapeutic and behavioral health treatments, has piloted Sonde’s mental fitness app (see this website for a case study: https://www.sondehealth.com/use-case-cbi). Participants included CBI clients in treatment for conditions such as depression and anxiety. More than 50% of the participating clients voluntarily used Sonde’s app every day and many others used the app several times a week for the duration of the 3-month pilot. Furthermore, more than 90% of CBI’s participating clients completed the entire pilot. This high level of engagement points to the feasibility and acceptance of objective mental fitness symptom monitoring.


Even for people who meet regularly with a mental health professional, their appointments usually occur once a week or less. By using the Sonde app daily or every other day, the client and provider can monitor measurements of how things are going between appointments. Most importantly, the Sonde app can help keep clients engaged when not seeing their provider. For example, clinicians at CBI can see client mental fitness scores and engagement levels with the Sonde app, as well as in-app responses to other clinically relevant reporting (such as the PHQ-9 or GAD-7). The inclusion of Sonde’s mental fitness app, as an addition to professional mental health treatment, may help mental health professionals evaluate the effectiveness of their therapy techniques and medications for each client.


Research and Development Value Propositions


Sonde Health is the world’s leader for identifying and building products around meaningful and robust vocal biomarkers. Sonde’s speech features have been employed at scale across ages, genders, nations, and microphone types. Importantly, all voice recordings are collected in real-world environments on consumer-owned devices, allowing our technology to perform in the hands of individual customers, not just in controlled laboratory conditions. Sonde has partnered with over thirty pharmaceutical and clinical research teams build and begin to analyze the world's largest and broadest digital biobank of voice and clinical health labels.  Sonde's full-featured services have been used to help partners rapidly collect, curate, and analyze voice data collection, accelerating vocal biomarker development in settings from clinical trials to consumer products.


Sonde’s API (https://www.sondehealth.com/developer-documentation) allows users to easily access Sonde’s acoustic and linguistic vocal biomarkers. Pharmaceutical and industry partners can use the API in a variety of applications, including more than 48 different types of speech tasks and across a wide range of health conditions. Sonde’s API can be used to allow companies to quickly prototype R&D solutions to pressing health-related problems.


In addition to using Sonde’s acoustic features, Sonde offers a survey-building platform called SurveyLex (https://www.sondehealth.com/voice-surveys). Through SurveyLex, users can create customized protocols or use previously-validated protocols from published vocal biomarker research. All users can easily download the survey responses, audio files, transcriptions, and acoustic features from the SurveyLex website.


When we are mentally fit, we flourish. Through increased awareness of mind and body, we can take charge of our journey toward health and wellness. Sonde: Your Health Speaks.

If you'd be interested in a partnership with Sonde, please schedule time for a meeting with our Vice President of Business Development, Anya Gupta, at this link: https://calendly.com/sonde-anyagupta


Knowledge is Power: The Science Behind Sonde Health’s Solution


The voice of health


Sonde Health’s mental fitness app empowers people to take control of their mental health by providing objective and numerical mental fitness scores using 30-second voice recordings. Sonde’s mental fitness app draws on years of research and more than 1 million voice recordings from over 80,000 individuals. Voice samples were collected in several hospitals in India, an outpatient mental health clinic in the United States, as well as through online research. All participants completed the Patient Health Questionnaire (PHQ-9), a self-administered mental health evaluation validated by clinicians as a method to screen for clinical depression. There are nine multiple-choice questions in the PHQ-9, each of which corresponds to a prominent depressive symptom. All protocols were approved by an Institutional Review Board and all data were encrypted on the smart-device and transmitted to secure cloud storage.


Your health speaks


There are more than 100 muscles that are used every time you make a speech sound. Coordinating all of these muscle movements requires input from the nervous system, including the brain. Some parts of the brain that are activated when you speak are the auditory system, the sensory system, and the language perception system. Health conditions, such as clinical depression, can result in reduced coordination between the brain and the muscles used in speech production. When coordination is interrupted, there are measurable differences in certain parts of the speech signal. Some of these speech differences are audible to the human ear, whereas others can only be detected through signal processing, such as Sonde’s technology.

 

Figure 2 offers an example of how depressed and non-depressed speech differ. Two females in their early thirties recorded answers to the prompt, “How have you been feeling lately?” The non-depressed speaker, represented in Figure 2.a, responded by saying, “Pretty good, energetic. I’ve been motivated to go look for more event work, you know, event planning stuff.” The depressed speaker in Figure 2.b responded with, “Stressed, um, uh, lately I’ve been really sad and I don’t know why.” In this example, the depressed speaker has less energy in her voice than the non-depressed speaker. The depressed speaker also talks at a slower rate than the non-depressed speaker.

 
(a) Non-depressed female (PHQ-9=0)

(a) Non-depressed female (PHQ-9=0)

(b) Depressed female (PHQ-9=23)

(b) Depressed female (PHQ-9=23)

 

Figure 2. Waveforms (top images) and spectrograms (bottom images) of speech from a non-depressed and a depressed female speaker. The pauses and slower speaking rate of the depressed speaker are shown in the top (blue) graphs, representing the recorded microphone signal, by the long sections that look like a line, representing silence. The two bottom images show the amount of energy in a speaker’s voice at high and low tones: more vocal energy corresponds with brighter colors, so that yellow represents the most energy, followed by orange, red, and finally black, which means no energy. Figure created by Zhaocheng Huang, University of New South Wales.

Sonde’s voice visualizations for enhanced mental fitness insights

Sonde’s scientists and researchers around the world have analyzed voice recordings to find vocal features that vary among individuals with high and low mental fitness. In the Mental Fitness app, five of these speech features are shown to users, including how their voice compares to other voices.

Smoothness

Speech smoothness describes how much a person’s vocal pitch changes when they speak. Pitch is a measure of your voice’s tone being low or high. The pitch of your voice depends on how quickly or slowly your vocal cords vibrate when you speak. A person’s gender and size impact the pitch of their voice. People can also deliberately change the pitch of their voices, in order to emphasize certain words and convey emotion to others. 

Each time the vocal folds vibrate, they produce a slightly different pitch. Most times, consecutive vocal fold vibrations have pitches very close together. Other times, there are noticeable differences in pitch from one vibration to the next. When a person’s pitch stays steady over time, their voice sounds smooth. On the other hand, when the vocal fold vibrations rapidly vary from high to low pitches, a voice might sound rough.

Biologically, voices become less smooth when a person has reduced control over their vocal tract muscles. For example, when a person is nervous, they might have low smoothness. Changes in muscle tone, fatigue, inflammation, as well as mental health can affect this degree of control. Research conducted at Vanderbilt University showed that there was less smoothness (“jitter” is the scientific term used to describe deviations from smoothness) in depressed individuals than in non-depressed people (Ozdas et al., 2004).

Control

Speech control describes small changes in air pressure as you speak. When you speak, air moves from your lungs, through your vocal folds, and then out of your mouth. The amount of air moving at a particular point in time is called air pressure. Differences in air pressure can change how loud or soft your voice sounds. 

Sometimes, the amount of air moving through your vocal tract changes quickly, whereas, other times, the amount of air stays stable over time. A controlled voice is one where the air pressure stays stable over time; lack of such control is sometimes described as a breathy voice.

Rapid changes in air pressure are the result of reduced control over voice production. Many physiological differences can lead to breathy voices. For example, control decreases when you are tired, when your vocal folds are inflamed, or if you develop vocal nodules. Decreases in in voice control have also been associated with decreases in mental fitness (e.g., Hönig et al., 2014; Quatieri & Malyska, 2012, using the term “shimmer” to describe lack of control).

Liveliness

Liveliness also involves natural changes in how high or low a person’s voice sounds as they speak. Whereas smoothness measures rapid involuntary pitch changes, liveliness measures the distance between the highest and lowest deliberate pitch change over a certain amount of time. 

Using music as an analogy, liveliness may be measured as the distance between two notes in a musical scale. The distance from one note called “C” to the next note called “C” is called an octave. From the lowest to highest notes in an octave, the frequency of the sound vibrations double. On a piano, the distance between note C and note D is small: it is 8% or 0.08 of an octave. That means that the frequency of the sound wave doesn’t change much. The distance between note C and note B is much bigger: it is 92% or 0.92 of an octave. This larger distance means that the frequency of the higher note (B) is almost double than the frequency of the lower note (C).

We can use the musical octave to measure the distance between the highest and lowest pitch of your voice when you speak. A voice with a large pitch range sounds lively. When we speak, we use changes in pitch for vocal variety to emphasize certain words or to express emotion to another person. To a listener, voices with reduced liveliness are often described as sounding more monotone. Speech researchers and mental health clinicians have observed that when a person has lower mental fitness, their liveliness is more likely to be reduced (e.g., Breznitz et al., 1992; Darby et al., 1984; Hönig et al., 2014; Hussenbocus et al., 2015; Kuny & Stassen, 1993; Mundt et al., 2007; Nilsonne, 1987; Nilsonne et al., 1988; Porritt et al., 2015; Simantiraki et al., 2017; Stassen et al. 1998; Tolkmitt et al., 1982). 

Energy range

Energy range also involves changes in how loud or soft a person’s voice sounds as they speak. Sound volume is measured in decibels (dB), which range from about 0 dB (totally quiet) to 190 dB (the loudest possible sound). A very soft sound, such as whispering, has a volume of about 20 dB. A loud sound, such as a power saw, has a volume of about 110 dB. It takes more energy to speak loudly than softly. An increase in volume of 10 dB will sound about twice as loud.


Energy range measures the difference between the loudest and softest volume as you are speaking. Some people’s voices may naturally have volumes that range from very soft to very loud. These voices sound more energetic or dynamic. Other voices may always stay at a similar volume and sound less energetic.


When we speak, we use energy range (changes in loudness) for emphasis and to engage a listener. Scientific research has shown that people who are disengaged, fatigued, or who have reduced mental health have less energy range than people who are energetic or who have higher mental fitness (Quatieri & Malyska, 2012).


Clarity

Clarity measures how understandable (or intelligible) your voice is to others. When we speak, air moves through the vocal tract, which is a space that goes from your lips to the back of your mouth and down the back of your mouth. The shape of each person’s vocal tract is unique. Because of differences in vocal tract shape, each person’s voice has a unique tone color—this is how you can recognize someone’s voice when you can’t see them, like over the phone. 


When a person talks, they use different muscles, like the tongue, jaw, and lips, to form words. Using these speech muscles changes the shape of the person’s vocal tract. Different shapes lead to different vowel sounds, such as “ahhh” or “eee.” Speaking clearly requires precise movements of the muscles that help with speech articulation, such as the tongue. A clear voice will have vowel sounds are more distinct. If a person has trouble coordinating the muscles involved with speech, their voice will sound more mumbled. In the case of a mumbled voice, multiple vowel sounds may sound similar and so, another person may have trouble understanding the speech. 


One symptom of clinical depression is having trouble with speech articulation. Research shows that clinically depressed speakers tend to have voices that are less clear or understandable to others – an observation first made in the early 20th century and used since then by clinical practitioners as a valuable signal of changes in mental health and fitness (Kraepelin, 1921).


Sonde Health’s vocal biomarker research in mental health

Cummins, N., Scherer, S., Krajewski, J., Schnieder, S., Epps, J., & Quatieri, T. F. (2015). A review of depression and suicide risk assessment using speech analysis. Speech Communication, 71, 10-49. https://www.sciencedirect.com/science/article/abs/pii/S0167639315000369

Huang, Z., Epps, J., & Joachim, D. (2019a). Investigation of speech landmark patterns for depression detection. IEEE Transactions on Affective Computing. https://ieeexplore.ieee.org/document/8861018

Huang, Z., Epps, J., & Joachim, D. (2019b). Speech landmark bigrams for depression detection from naturalistic smartphone speech. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5856-5860). IEEE. https://www.researchgate.net/publication/332790688_Speech_Landmark_Bigrams_for_Depression_Detection_from_Naturalistic_Smartphone_Speech

Huang, Z., Epps, J., Joachim, D., & Chen, M. (2018). Depression Detection from Short Utterances via Diverse Smartphones in Natural Environmental Conditions. In INTERSPEECH (pp. 3393-3397). https://www.researchgate.net/publication/327389040_Depression_Detection_from_Short_Utterances_via_Diverse_Smartphones_in_Natural_Environmental_Conditions

Huang, Z., Epps, J., Joachim, D., & Sethu, V. (2019). Natural language processing methods for acoustic and landmark event-based features in speech-based depression detection. IEEE Journal of Selected Topics in Signal Processing, 14(2), 435-448. https://www.researchgate.net/publication/336790978_Natural_Language_Processing_Methods_for_Acoustic_and_Landmark_Event-Based_Features_in_Speech-Based_Depression_Detection

Huang, Z., Epps, J., Joachim, D., Stasak, B., Williamson, J. R., & Quatieri, T. F. (2020). Domain adaptation for enhancing Speech-based depression detection in natural environmental conditions using dilated CNNs. Interspeech 2020, 4561-4565. http://www.interspeech2020.org/uploadfile/pdf/Thu-3-1-4.pdf

Ozkanca, Y., Öztürk, M. G., Ekmekci, M. N., Atkins, D. C., Demiroglu, C., & Ghomi, R. H. (2019). Depression screening from voice samples of patients affected by Parkinson’s Disease. Digital Biomarkers, 3(2), 72-82.

https://www.karger.com/Article/FullText/500354

Stasak, B., & Epps, J. (2017, October). Differential performance of automatic speech-based depression classification across smartphones. In 2017 Seventh International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW) (pp. 171-175). IEEE. https://www.researchgate.net/publication/319253038_Differential_Performance_of_Automatic_Speech-Based_Depression_Classification_Across_Smartphones

Stasak, B., Joachim, D., & Epps, J., (in press). “An Investigation of Automatic Age-Dependent Speech-Based Depression Detection.”

Zhang, L., Driscol, J., Chen, X., & Hosseini Ghomi, R. (2019, October). Evaluating Acoustic and Linguistic Features of Detecting Depression Sub-Challenge Dataset. In Proceedings of the 9th International on Audio/Visual Emotion Challenge and Workshop (pp. 47-53).

https://dl.acm.org/doi/10.1145/3347320.3357693


Zhang, L., Duvvuri, R., Chandra, K. K., Nguyen, T., & Ghomi, R. H. (2020). Automated voice biomarkers for depression symptoms using an online cross‐sectional data collection initiative. Depression and Anxiety, 37(7), 657-669.
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3408093