On the limited memory BFGS method for large scale optimization
Mathematical Programming: Series A and B
WordNet: a lexical database for English
Communications of the ACM
A maximum entropy approach to natural language processing
Computational Linguistics
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Chinese word segmentation based on maximum matching and word binding force
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Text classification and named entities for new event detection
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Text classification improved through multigram models
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Yago: a core of semantic knowledge
Proceedings of the 16th international conference on World Wide Web
Mining Domain-Specific Thesauri from Wikipedia: A Case Study
WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Recommending questions using the mdl-based tree cut model
Proceedings of the 17th international conference on World Wide Web
Proceedings of the 17th international conference on World Wide Web
Enhancing text clustering by leveraging Wikipedia semantics
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Retrieval models for question and answer archives
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Deep classification in large-scale text hierarchies
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Building semantic kernels for text classification using wikipedia
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Improving Text Classification by Using Encyclopedia Knowledge
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Exploiting Wikipedia as external knowledge for document clustering
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
A syntactic tree matching approach to finding similar questions in community-based qa services
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Deriving a large scale taxonomy from Wikipedia
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Exploiting internal and external semantics for the clustering of short texts using world knowledge
Proceedings of the 18th ACM conference on Information and knowledge management
Proceedings of the 19th international conference on World wide web
Prototype hierarchy based clustering for the categorization and navigation of web collections
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Phrase-based translation model for question retrieval in community question answer archives
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Exploring the existing category hierarchy to automatically label the newly-arising topics in cQA
Proceedings of the 21st ACM international conference on Information and knowledge management
Community question topic categorization via hierarchical kernelized classification
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Improving semi-supervised text classification by using wikipedia knowledge
WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Improving question retrieval in community question answering using world knowledge
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Utilizing global and path information with language modelling for hierarchical text classification
Journal of Information Science
Hi-index | 0.00 |
With the flourishing of community-based question answering (cQA) services like Yahoo! Answers, more and more web users seek their information need from these sites. Understanding user's information need expressed through their search questions is crucial to information providers. Question classification in cQA is studied for this purpose. However, there are two main difficulties in applying traditional methods (question classification in TREC QA and text classification) to cQA: (1) Traditional methods confine themselves to classify a text or question into two or a few predefined categories. While in cQA, the number of categories is much larger, such as Yahoo! Answers, there contains 1,263 categories. Our empirical results show that with the increasing of the number of categories to moderate size, the performance of the classification accuracy dramatically decreases. (2) Unlike the normal texts, questions in cQA are very short, which cannot provide sufficient word co-occurrence or shared information for a good similarity measure due to the data sparseness. In this paper, we propose a two-stage approach for question classification in cQA that can tackle the difficulties of the traditional methods. In the first stage, we preform a search process to prune the large-scale categories to focus our classification effort on a small subset. In the second stage, we enrich questions by leveraging Wikipedia semantic knowledge to tackle the data sparseness. As a result, the classification model is trained on the enriched small subset. We demonstrate the performance of our proposed method on Yahoo! Answers with 1,263 categories. The experimental results show that our proposed method significantly outperforms the baseline method (with error reductions of 23.21%).