Combining image, voice, and the patient's questionnaire data to categorize laryngeal disorders

  • Authors:
  • Antanas Verikas;Adas Gelzinis;Marija Bacauskiene;Magnus Hållander;Virgilijus Uloza;Marius Kaseta

  • Affiliations:
  • Department of Electrical & Control Equipment, Kaunas University of Technology, Studentu 50, LT-51368, Kaunas, Lithuania and Intelligent Systems Laboratory, Halmstad University, Box 823, S 301 18 H ...;Department of Electrical & Control Equipment, Kaunas University of Technology, Studentu 50, LT-51368, Kaunas, Lithuania;Department of Electrical & Control Equipment, Kaunas University of Technology, Studentu 50, LT-51368, Kaunas, Lithuania;Intelligent Systems Laboratory, Halmstad University, Box 823, S 301 18 Halmstad, Sweden;Department of Otolaryngology, Kaunas University of Medicine, Eiveniu 2, LT-50009 Kaunas, Lithuania;Department of Otolaryngology, Kaunas University of Medicine, Eiveniu 2, LT-50009 Kaunas, Lithuania

  • Venue:
  • Artificial Intelligence in Medicine
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Objective: This paper is concerned with soft computing techniques for categorizing laryngeal disorders based on information extracted from an image of patient's vocal folds, a voice signal, and questionnaire data. Methods: Multiple feature sets are exploited to characterize images and voice signals. To characterize colour, texture, and geometry of biological structures seen in colour images of vocal folds, eight feature sets are used. Twelve feature sets are used to obtain a comprehensive characterization of a voice signal (the sustained phonation of the vowel sound /a/). Answers to 14 questions constitute the questionnaire feature set. A committee of support vector machines is designed for categorizing the image, voice, and query data represented by the multiple feature sets into the healthy, nodular and diffuse classes. Five alternatives to aggregate separate SVMs into a committee are explored. Feature selection and classifier design are combined into the same learning process based on genetic search. Results: Data of all the three modalities were available from 240 patients. Among those, 151 patients belong to the nodular class, 64 to the diffuse class and 25 to the healthy class. When using a single feature set to characterize each modality, the test set data classification accuracy of 75.0%, 72.1%, and 85.0% was obtained for the image, voice and questionnaire data, respectively. The use of multiple feature sets allowed to increase the accuracy to 89.5% and 87.7% for the image and voice data, respectively. The test set data classification accuracy of over 98.0% was obtained from a committee exploiting multiple feature sets from all the three modalities. The highest classification accuracy was achieved when using the SVM-based aggregation with hyper parameters of the SVM determined by genetic search. Bearing in mind the difficulty of the task, the obtained classification accuracy is rather encouraging. Conclusions: Combination of both multiple feature sets characterizing a single modality and the three modalities allowed to substantially improve the classification accuracy if compared to the highest accuracy obtained from a single feature set and a single modality. In spite of the unbalanced data sets used, the error rates obtained for the three classes were rather similar.