Multiclass text categorization for automated survey coding

  • Authors:
  • Daniela Giorgetti;Fabrizio Sebastiani

  • Affiliations:
  • Consiglio Nazionale delle Ricerche, 56124 Pisa, Italy;Consiglio Nazionale delle Ricerche, 56124 Pisa, Italy

  • Venue:
  • Proceedings of the 2003 ACM symposium on Applied computing
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Survey coding is the task of assigning a symbolic code from a predefined set of such codes to the answer given in response to an open-ended question in a questionnaire (aka survey). We formulate the problem of automated survey coding as a text categorization problem, i.e. as the problem of learning, by means of supervised machine learning techniques, a model of the association between answers and codes from a training set of pre-coded answers, and applying the resulting model to the classification of new answers. In this paper we experiment with two different learning techniques, one based on naïmillve Bayesian classification and the other one based on multiclass support vector machines, and test the resulting framework on a corpus of social surveys. The results we have obtained significantly outperform the results achieved by previous automated survey coding approaches.