A Probabilistic Neighbourhood Translation Approach for Non-standard Text Categorisation

  • Authors:
  • Ata Kabán

  • Affiliations:
  • School of Computer Science, The University of Birmingham, Birmingham, UK B15 2TT

  • Venue:
  • DS '08 Proceedings of the 11th International Conference on Discovery Science
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The need for non-standard text categorisation, i.e. based on some subtle criterion other than topics, may arise in various circumstances. In this study, we consider written responses to a standardised psychometric test for determining the personality trait of human subjects. A number of state-of-the-art text classifiers that have been very successful in standard topic-based classification problems turn out to perform poorly in this task. Here we propose a very simple probabilistic approach, which is able to achieve accurate predictions, and demonstrates this peculiar problem is still solvable by simple statistical text representation means. We then extend this approach to include a latent variable, in order to obtain additional explanatory information beyond a black-box prediction.