A Probabilistic Neighbourhood Translation Approach for Non-standard Text Categorisation

Authors:
Ata Kabán
Affiliations:
School of Computer Science, The University of Birmingham, Birmingham, UK B15 2TT
Venue:
DS '08 Proceedings of the 11th International Conference on Discovery Science
Year:
2008

Citing 6
Cited 0

An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
A vector space model for automatic indexing

Communications of the ACM
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Machine Learning

Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Modeling word burstiness using the Dirichlet distribution

ICML '05 Proceedings of the 22nd international conference on Machine learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

The need for non-standard text categorisation, i.e. based on some subtle criterion other than topics, may arise in various circumstances. In this study, we consider written responses to a standardised psychometric test for determining the personality trait of human subjects. A number of state-of-the-art text classifiers that have been very successful in standard topic-based classification problems turn out to perform poorly in this task. Here we propose a very simple probabilistic approach, which is able to achieve accurate predictions, and demonstrates this peculiar problem is still solvable by simple statistical text representation means. We then extend this approach to include a latent variable, in order to obtain additional explanatory information beyond a black-box prediction.