Cost effective depression patient thought record categorization via self-taught learning

  • Authors:
  • Hua Wang;Heng Huang;Monica Basco;Molly Lopez;Fillia Makedon

  • Affiliations:
  • University of Texas at Arlington, TX;University of Texas at Arlington, TX;University of Texas at Arlington, TX;The University of Texas at Austin, TX;University of Texas at Arlington, TX

  • Venue:
  • Proceedings of the 4th International Conference on PErvasive Technologies Related to Assistive Environments
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Automatic patient though record categorization (TR) is important in Cognitive Behavior Therapy (CBT), which is an useful augmentation of standard clinic treatment for Major Depressive Disorder (MDD). Because both collection and labeling of TR data are expensive, it is cost prohibitive to require a large amount of TR data, as well as their cor-responding category labels, to train a classification model with high classification accuracy. As in practice we only have very limited amount of labeled and unlabeled training TR data, traditional semi-supervised learning methods and transfer learning methods, which are the most commonly used strategies to deal with the lack of training data in statistical learning, can not work well in the task of automatic TR categorization. With the recognition of these challenges, in this paper we propose to approach the TR categorization problem from a new perspective via self-taught learning, an emerging topic in machine learning. Self-taught learning is a special type of transfer learning, instead of requiring labeled data from an auxiliary domain that are relevant to the classification task of interest as in traditional transfer learning methods, it learns the inherent structures of the auxiliary data and does not require their labels. Consequently, we may learn a classifier using the limited amount of labeled TR texts to achieve decent classification accuracy, with the assistance from the large amount text data obtained from some cheap, or even no-cost, resources. That is, a cost effective TR categorization system can be built up, which is of particular use in practical diagnosis and training of new therapists. We demonstrate the proposed method in the task of classifying the real depression homework texts, where promising experimental results validate the effectiveness of our new method.