Multiclass text categorization for automated survey coding

Authors:
Daniela Giorgetti;Fabrizio Sebastiani
Affiliations:
Consiglio Nazionale delle Ricerche, 56124 Pisa, Italy;Consiglio Nazionale delle Ricerche, 56124 Pisa, Italy
Venue:
Proceedings of the 2003 ACM symposium on Applied computing
Year:
2003

Citing 8
Cited 3

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Feature selection in SVM text categorization

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
A Simple Decomposition Method for Support Vector Machines

Machine Learning
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Performance Evaluation of Automatic Survey Classifiers

ICGI '98 Proceedings of the 4th International Colloquium on Grammatical Inference
On the algorithmic implementation of multiclass kernel-based vector machines

The Journal of Machine Learning Research
A comparison of methods for multiclass support vector machines

IEEE Transactions on Neural Networks

Document-Base Extraction for Single-Label Text Classification

DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
Effects of Term Distributions on Binary Classification

IEICE - Transactions on Information and Systems
Automatic occupation coding with combination of machine learning and hand-crafted rules

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Survey coding is the task of assigning a symbolic code from a predefined set of such codes to the answer given in response to an open-ended question in a questionnaire (aka survey). We formulate the problem of automated survey coding as a text categorization problem, i.e. as the problem of learning, by means of supervised machine learning techniques, a model of the association between answers and codes from a training set of pre-coded answers, and applying the resulting model to the classification of new answers. In this paper we experiment with two different learning techniques, one based on naïmillve Bayesian classification and the other one based on multiclass support vector machines, and test the resulting framework on a corpus of social surveys. The results we have obtained significantly outperform the results achieved by previous automated survey coding approaches.