Voice Pathology Classification by Using Features from High-Speed Videos

  • Authors:
  • Daniel Voigt;Michael Döllinger;Anxiong Yang;Ulrich Eysholdt;Jörg Lohscheller

  • Affiliations:
  • Department of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Erlangen, Germany D-91054;Department of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Erlangen, Germany D-91054;Department of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Erlangen, Germany D-91054;Department of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Erlangen, Germany D-91054;Department of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Erlangen, Germany D-91054

  • Venue:
  • AIME '09 Proceedings of the 12th Conference on Artificial Intelligence in Medicine: Artificial Intelligence in Medicine
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

For the diagnosis of pathological voices it is of particular importance to examine the dynamic properties of the underlying vocal fold (VF) movements occurring at a fundamental frequency of 100---300 Hz. To this end, a patient's laryngeal oscillation patterns are captured with state-of-the-art endoscopic high-speed (HS) camera systems capable of recording 4000 frames/second. To date the clinical analysis of these HS videos is commonly performed in a subjective manner via slow-motion playback. Hence, the resulting diagnoses are inherently error-prone, exhibiting high inter-rater variability. In this paper an objective method for overcoming this drawback is presented which employs a quantitative description and classification approach based on a novel image analysis strategy called Phonovibrography. By extracting the relevant VF movement information from HS videos the spatio-temporal patterns of laryngeal activity are captured using a set of specialized features. As reference for performance, conventional voice analysis features are also computed. The derived features are analyzed with different machine learning (ML) algorithms regarding clinically meaningful classification tasks. The applicability of the approach is demonstrated using a clinical data set comprising individuals with normophonic and paralytic voices. The results indicate that the presented approach holds a lot of promise for providing reliable diagnosis support in the future.