Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
A critique and improvement of an evaluation metric for text segmentation
Computational Linguistics
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Boosting Approach to Topic Spotting on Subdialogues
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
On the algorithmic implementation of multiclass kernel-based vector machines
The Journal of Machine Learning Research
TextTiling: segmenting text into multi-paragraph subtopic passages
Computational Linguistics
Support vector machine learning for interdependent and structured output spaces
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Discourse segmentation of multi-party conversation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
A web-based kernel function for measuring the similarity of short text snippets
Proceedings of the 15th international conference on World Wide Web
Self-taught learning: transfer learning from unlabeled data
Proceedings of the 24th international conference on Machine learning
Introduction to Information Retrieval
Introduction to Information Retrieval
Common sense data acquisition for indoor mobile robots
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Domain adaptation for statistical classifiers
Journal of Artificial Intelligence Research
The ESA retrieval model revisited
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Robust distance metric learning with auxiliary knowledge
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Expert Systems with Applications: An International Journal
Concept-Based Information Retrieval Using Explicit Semantic Analysis
ACM Transactions on Information Systems (TOIS)
Knowledge transfer based on feature representation mapping for text classification
Expert Systems with Applications: An International Journal
Semantic translation for rule-based knowledge in data mining
DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part II
NLDB'09 Proceedings of the 14th international conference on Applications of Natural Language to Information Systems
Exploiting Wikipedia for cross-lingual and multilingual information retrieval
Data & Knowledge Engineering
Concept labeling: building text classifiers with minimal supervision
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
A Dispatch-Mediated Communication Model for Emergency Response Systems
ACM Transactions on Management Information Systems (TMIS)
Enhancing short text clustering with small external repositories
AusDM '11 Proceedings of the Ninth Australasian Data Mining Conference - Volume 121
User demographics prediction based on mobile data
Pervasive and Mobile Computing
Hi-index | 0.00 |
Multi-category classification of short dialogues is a common task performed by humans. When assigning a question to an expert, a customer service operator tries to classify the customer query into one of N different classes for which experts are available. Similarly, questions on the web (for example questions at Yahoo Answers) can be automatically forwarded to a restricted group of people with a specific expertise. Typical questions are short and assume background world knowledge for correct classification. With exponentially increasing amount of knowledge available, with distinct properties (labeled vs unlabeled, structured vs unstructured), no single knowledge-transfer algorithm such as transfer learning, multi-task learning or selftaught learning can be applied universally. In this work we show that bag-of-words classifiers performs poorly on noisy short conversational text snippets. We present an algorithm for leveraging heterogeneous data sources and algorithms with significant improvements over any single algorithm, rivaling human performance. Using different algorithms for each knowledge source we use mutual information to aggressively prune features. With heterogeneous data sources including Wikipedia, Open Directory Project (ODP), and Yahoo Answers, we show 89.4% and 96.8% correct classification on Google Answers corpus and Switchboard corpus using only 200 features/class. This reflects a huge improvement over bag of words approaches and 48-65% error reduction over previously published state of art (Gabrilovich et. al. 2006).