Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Combining model-oriented and description-oriented approaches for probabilistic indexing
SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
The effect of adding relevance information in a relevance feedback environment
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
A comparison of classifiers and document representations for the routing problem
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Information Retrieval
Bayesian online classifiers for text classification and filtering
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Text Categorization Based on Regularized Linear Classification Methods
Information Retrieval
BAYES AND EMPIRICAL BAYES METHODS FOR DATA ANALYSIS
Statistics and Computing
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Incorporating Prior Knowledge into Boosting
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Robustness of regularized linear classification methods in text categorization
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
Incorporating prior knowledge with weighted margin support vector machines
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
The SMART Retrieval System—Experiments in Automatic Document Processing
The SMART Retrieval System—Experiments in Automatic Document Processing
Text classification by labeling words
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Feature generation for text categorization using world knowledge
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Transferring and retraining learned information filters
AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Efficient bayesian hierarchical user modeling for recommendation system
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
An interactive algorithm for asking and incorporating feature feedback into support vector machines
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Improving text classification for oral history archives with temporal domain knowledge
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Improving active learning recall via disjunctive boolean constraints
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Mutually beneficial learning with application to on-line news classification
Proceedings of the ACM first Ph.D. workshop in CIKM
A bayesian logistic regression model for active relevance feedback
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Learning from labeled features using generalized expectation criteria
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Semi-supervised Collaborative Text Classification
ECML '07 Proceedings of the 18th European conference on Machine Learning
Categorisation of web documents using extraction ontologies
International Journal of Metadata, Semantics and Ontologies
Knowledge Supervised Text Classification with No Labeled Documents
PRICAI '08 Proceedings of the 10th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
Sentiment analysis of blogs by combining lexical knowledge with text classification
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Active dual supervision: reducing the cost of annotating examples and features
HLT '09 Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing
Interactive clustering of text collections according to a user-specified criterion
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
A unified approach to active dual supervision for labeling features and examples
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Large-scale hierarchical text classification without labelled data
Proceedings of the fourth ACM international conference on Web search and data mining
Filtering semi-structured documents based on faceted feedback
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Mining the “Voice of the Customer” for Business Prioritization
ACM Transactions on Intelligent Systems and Technology (TIST)
Hi-index | 0.00 |
Supervised learning approaches to text classification are in practice often required to work with small and unsystematically collected training sets. The alternative to supervised learning is usually viewed to be building classifiers by hand, using a domain expert's understanding of which features of the text are related to the class of interest. This is expensive, requires a degree of sophistication about linguistics and classification, and makes it difficult to use combinations of weak predictors. We propose instead combining domain knowledge with training examples in a Bayesian framework. Domain knowledge is used to specify a prior distribution for the parameters of a logistic regression model, and labeled training data is used to produce a posterior distribution, whose mode we take as the final classifier. We show on three text categorization data sets that this approach can rescue what would otherwise be disastrously bad training situations, producing much more effective classifiers.