Yahoo! as an ontology: using Yahoo! categories to describe documents
Proceedings of the eighth international conference on Information and knowledge management
Hierarchical classification of Web content
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Hierarchically Classifying Documents Using Very Few Words
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Hierarchical Text Classification and Evaluation
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Hierarchical document categorization with support vector machines
Proceedings of the thirteenth ACM international conference on Information and knowledge management
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Scaling to very very large corpora for natural language disambiguation
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Optimization, maxent models, and conditional estimation without magic
NAACL-Tutorials '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Tutorials - Volume 5
Improving Automatic Query Classification via Semi-Supervised Learning
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Question classification using HDAG kernel
MultiSumQA '03 Proceedings of the ACL 2003 workshop on Multilingual summarization and question answering - Volume 12
Building a reusable test collection for question answering
Journal of the American Society for Information Science and Technology - Research Articles
Hierarchical classification: combining Bayes with SVM
ICML '06 Proceedings of the 23rd international conference on Machine learning
Question classification with log-linear models
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Query enrichment for web-query classification
ACM Transactions on Information Systems (TOIS)
Robust classification of rare queries using web knowledge
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Varying approaches to topical web query classification
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Finding high-quality content in social media
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Discovering key concepts in verbose queries
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Deep classification in large-scale text hierarchies
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
A sequential dual method for large scale multi-class linear svms
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Refined experts: improving classification in large taxonomies
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
A syntactic tree matching approach to finding similar questions in community-based qa services
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
The use of categorization information in language models for question retrieval
Proceedings of the 18th ACM conference on Information and knowledge management
The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
Context-based term frequency assessment for text classification
Journal of the American Society for Information Science and Technology
Proceedings of the 19th international conference on World wide web
Text-based video content classification for online video-sharing sites
Journal of the American Society for Information Science and Technology
A survey of hierarchical classification across different application domains
Data Mining and Knowledge Discovery
Re-ranking question search results by clustering questions
Journal of the American Society for Information Science and Technology
Journal of the American Society for Information Science and Technology
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Category hierarchy maintenance: a data-driven approach
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Community question topic categorization via hierarchical kernelized classification
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Joint question clustering and relevance prediction for open domain non-factoid question answering
Proceedings of the 23rd international conference on World wide web
Hi-index | 0.00 |
We study the problem of question topic classification using a very large real-world Community Question Answering (CQA) dataset from Yahoo! Answers. The dataset comprises 3.9 million questions and these questions are organized into more than 1,000 categories in a hierarchy. To the best knowledge, this is the first systematic evaluation of the performance of different classification methods on question topic classification as well as short texts. Specifically, we empirically evaluate the following in classifying questions into CQA categories: (a) the usefulness of n-gram features and bag-of-word features; (b) the performance of three standard classification algorithms (naive Bayes, maximum entropy, and support vector machines); (c) the performance of the state-of-the-art hierarchical classification algorithms; (d) the effect of training data size on performance; and (e) the effectiveness of the different components of CQA data, including subject, content, asker, and the best answer. The experimental results show what aspects are important for question topic classification in terms of both effectiveness and efficiency. We believe that the experimental findings from this study will be useful in real-world classification problems. © 2012 Wiley Periodicals, Inc.