Information retrieval: data structures and algorithms
Information retrieval: data structures and algorithms
Information retrieval
COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Automating the creation of information filters
Communications of the ACM - Special issue on information filtering
Personalized information delivery: an analysis of information filtering methods
Communications of the ACM - Special issue on information filtering
A sequential algorithm for training text classifiers
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Information filtering based on user behavior analysis and best match text retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Improving Generalization with Active Learning
Machine Learning - Special issue on structured connectionist systems
The nature of statistical learning theory
The nature of statistical learning theory
Optimization of relevance feedback weights
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
ANATAGONOMY: a personalized newspaper on the World Wide Web
International Journal of Human-Computer Studies - Special issue: innovative applications of the World Wide Web
Learning routing queries in a query zone
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Learning and Revising User Profiles: The Identification ofInteresting Web Sites
Machine Learning - Special issue on multistrategy learning
Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Boosting and Rocchio applied to text filtering
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Text retrieval and filtering: analytic models of performance
Text retrieval and filtering: analytic models of performance
Making large-scale support vector machine learning practical
Advances in kernel methods
MetaCost: a general method for making classifiers cost-sensitive
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A hybrid user model for news story classification
UM '99 Proceedings of the seventh international conference on User modeling
Active learning using adaptive resampling
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
Analyzing the effectiveness and applicability of co-training
Proceedings of the ninth international conference on Information and knowledge management
Information Retrieval
Learning When Negative Examples Abound
ECML '97 Proceedings of the 9th European Conference on Machine Learning
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Less is More: Active Learning with Support Vector Machines
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Employing EM and Pool-Based Active Learning for Text Classification
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
The SMART Retrieval System—Experiments in Automatic Document Processing
The SMART Retrieval System—Experiments in Automatic Document Processing
Letizia: an agent that assists web browsing
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Erratum: Erratum to "Knowledge discovery and knowledge validation in intensive care"
Artificial Intelligence in Medicine
Summarization as feature selection for text categorization
Proceedings of the tenth international conference on Information and knowledge management
Editorial: special issue on learning from imbalanced data sets
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Hi-index | 0.00 |
In certain classification problems there is a strong a asymmetry between the number of labeled examples available for each of the classes involved. In an extreme case, there may be a complete lack of labeled data for one of the classes while, at the same time, there are adequate labeled examples for the others, accompanied by a large body of unlabeled data. Since most classification algorithms require some information about all classes involved, label estimation for the un-represented class is desired. An important representative of this group of problems is that of user interest/preference modeling where there may be a large number of examples of what the user likes with essentially no counterexamples.Recently, there has been much interest in applying the EM algorithm to incomplete data problems in the area of text retrieval and categorization. We adapt this approach to the asymmetric case of modeling user interests in news articles, where only labeled positive training data are available, with access to a large corpus of unlabeled documents. User modeling is here equivalent to that of user-specific document ranking. EM is used in conjunction with the Naive Bayes model while its output is also utilized by a Support Vector Machine and Rocchio's technique.Our findings demonstrate that the EM algorithm can be quite effective in modeling the negative class under a number of different initialization schemes. Although primarily just the negative training examples are needed, a natural question is whether using all of the estimated labels (i.e., positive and negative) would be more (or less) beneficial. This is important considering that, in this context, the initialization of the negative class for EM is likely not to be very accurate. Experimental results suggest that EM output should be limited to negative label estimates only.