<!– Google tag (gtag.js) –> <script async src=”https://www.googletagmanager.com/gtag/js?id=G-V6L8X4GWXW”></script> <script> window.dataLayer = window.dataLayer || []; function gtag(){dataLayer.push(arguments);} gtag(‘js’, new Date()); gtag(‘config’, ‘G-V6L8X4GWXW’); </script>

Frequently Asked Questions

If you don’t see what you are looking for, please reach out to our team at support@canaryspeech.com.

How does Canary technology work?

Voice contains inherent qualities that can be used as vocal biomarkers to reveal emotional, physiological, and cognitive states. Canary Speech uses these biomarkers to create models that evaluate disease from conversational speech in seconds.

What is the ideal audio recording size for analysis?

An audio length between 20-40 seconds is ideal.

What parts of speech are used to create vocal models?

Using a proprietary system, speech features for the deep learning model are extracted from the audio signal, or the vocal sample, and include analysis of three elements of speech: acoustic, prosodic, and linguistic.

How is data acquired when creating vocal models?

We have built highly accurate predictive models through machine learning. Alongside our respected healthcare partners, we have collected data samples of patient voices to train our models. There are several types of data used to build a robust machine learning model:

  • First, in order to learn which biomarkers are indicative of a specific condition, the algorithms are given samples of patients already diagnosed with the condition and compare them against healthy controls.
  • Second, samples from a variety of people are used to incorporate diverse demographics. 
  • Third, the model incorporates data from a range of acoustic environments and speech input devices to control for the different settings that the technology may be used.
  • Finally, we use a technique called data augmentation to improve the performance and outcomes of machine learning models by changing audio characteristics to form new examples to train datasets.

What does the vocal stress score indicate?

Stress can be defined as any type of change that causes physical, emotional, or psychological strain. Stress is your body’s response to anything that requires attention or action. The Canary Vocal score manifests the level of stress based on speech vocal biomarkers. The stress score is categorized into: Mild, Moderate, High.

What does the vocal mood score indicate?

Mood can be defined as an emotional state that may last anywhere from a few minutes to several weeks. The Canary Vocal score manifests the level of an individual’s’ mood based on speech vocal biomarkers. The mood score is categorized into: Low, Good, Excellent. 

Glossary of acoustic features

  • Mel-frequency cepstral coefficients (MFCC): Coefficients collectively make up an MFC, where MFC is a representation of the short-term power spectrum of sound
  • Perceptual Linear Predictive: An alternative to MFCC, a combination of spectral analysis and linear prediction analysis
  • Pitch: The fundamental period of the speech signal
  • Spectral flux: A measure of how quickly the power spectrum of a signal is changing
  • Spectral centroid: A measure where the center of mass of the spectrum is located
  • Spectral bandwidth: A bandwidth of signal spectrum
  • Spectral contrast: Decibel difference between spectral peaks and valleys
  • Spectral flatness: A measure of how much a sound resembles a pure tone
  • Spectral roll-off: The frequency below which a specified percentage of the total spectral energy
  • Harmonics-to-noise ratio (HNR): The ratio between periodic and non-periodic components of a speech sound
  • F0: Fundamental frequency of a speech signal, approximate frequency of the (quasi-)periodic structure of the voiced speech signal
  • Jitter: Variations in signal frequency
  • Shimmer: Variations in signal amplitude
  • WER: Word error rate. Usually, it is a measure to check the ASR (automatic speech recognition) accuracy. We use this rate for measuring the articulation of speech compared to the reading script.
  • Word_prob: A probability of a spoken word’s appearance in a big corpus. Common words such as happy, thank, etc. will get high probability and uncommon words such as extraordinary, canary, etc. will get low probability.
  • Filler ratio: Ratio of filler word usage (hmm, uh, oh, eh, …)
  • SYN: Syntactic part of speech ratio. for ex. ADJ (ratio of adjective word usage).
  • Lexical difficulty: Smog grade, age of acquisition of words, concreteness, ambiguity, familiarity
  • OOV: Out of vocabulary (unrecognizable word from ASR)

What does the vocal energy score indicate?

The composite energy score consists of three elements from speech: loudness, speed, and dynamics. Each category is scored on a range from 1-100. 

  • Loudness is for measuring the amount of air from the lungs that goes through the larynx. A higher value indicates more airflow and to be perceived louder.
  • Speed is to estimate the speech rate, i.e., how fast one speaks. It should approximate word per minute (WPM) in the language-agnostic phenomena. 
  • Dynamics is for modeling how the speech energy changes over time. A lower value represents monotonous speech.

What is a biomarker, and more specifically, a vocal biomarker?

The FDA defines a digital biomarker to be a characteristic or set of characteristics, collected from digital health technologies, that is measured as an indicator of normal biological processes, pathogenic processes, or responses to an exposure or intervention. A vocal biomarker is a feature, or a combination of features from the audio signal of the voice that is associated with a clinical outcome.

What technologies were used to create Canary Speech?

Machine learning, a form of artificial intelligence, is at the core of Canary Speech’s technology.  We use state-of-the-art machine learning methods, including deep neural networks, to automatically learn to make predictions based on features extracted from labeled data.

What is the reliability of the metrics?

An example of reliability is our anxiety model where the vocal model performed better than the current alternative of the GAD-7. Canary Speech’s binary classification model’s sensitivity was 0.70 and specificity was 0.54.

How large are the data sets?

Data sets by disease range in the hundreds to tens of thousands and differ based on the quantity required for science based machine learning techniques.

Is my information secure?

Canary is backed by a team of experts working tirelessly to maintain data security–from our development lifecycle, to continuous monitoring of our infrastructure and applications. Our technology is HIPAA-compliant and offers fully anonymous solutions. Our technology is CIS Control Audited, vulnerability scanned, and has cleared all API penetration testing.