Mining association language patterns using a distributional semantic model for negative life event classification

  • Authors:
  • Liang-Chih Yu;Chien-Lung Chan;Chao-Cheng Lin;I-Chun Lin

  • Affiliations:
  • Department of Information Management, Yuan Ze University, Chung-Li, Taiwan, ROC;Department of Information Management, Yuan Ze University, Chung-Li, Taiwan, ROC;Department of Psychiatry, National Taiwan University Hospital and National Taiwan University College of Medicine, Taipei, Taiwan, ROC;Department of Computer Science and Information Management, HungKuang University, Taichung, Taiwan, ROC and Department of Industrial Management, National Yunlin University of Science and Technology ...

  • Venue:
  • Journal of Biomedical Informatics
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Purpose: Negative life events, such as the death of a family member, an argument with a spouse or the loss of a job, play an important role in triggering depressive episodes. Therefore, it is worthwhile to develop psychiatric services that can automatically identify such events. This study describes the use of association language patterns, i.e., meaningful combinations of words (e.g., ), as features to classify sentences with negative life events into predefined categories (e.g., Family, Love, Work). Methods: This study proposes a framework that combines a supervised data mining algorithm and an unsupervised distributional semantic model to discover association language patterns. The data mining algorithm, called association rule mining, was used to generate a set of seed patterns by incrementally associating frequently co-occurring words from a small corpus of sentences labeled with negative life events. The distributional semantic model was then used to discover more patterns similar to the seed patterns from a large, unlabeled web corpus. Results: The experimental results showed that association language patterns were significant features for negative life event classification. Additionally, the unsupervised distributional semantic model was not only able to improve the level of performance but also to reduce the reliance of the classification process on the availability of a large, labeled corpus.