On the exponential value of labeled samples
Pattern Recognition Letters
Training algorithms for linear text classifiers
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Context-sensitive learning methods for text categorization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Threading electronic mail: a preliminary study
Information Processing and Management: an International Journal - Special issue: methods and tools for the automatic construction of hypertext
Bayesian classification (AutoClass): theory and results
Advances in knowledge discovery and data mining
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss
Machine Learning - Special issue on learning with probabilistic representations
Learning to extract symbolic knowledge from the World Wide Web
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality
Data Mining and Knowledge Discovery
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics)
Syskill & webert: Identifying interesting web sites
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Content-based book recommending using learning for text categorization
DL '00 Proceedings of the fifth ACM conference on Digital libraries
A Machine Learning Approach to POS Tagging
Machine Learning
Text classification in a hierarchical mixture model for small training sets
Proceedings of the tenth international conference on Information and knowledge management
The use of unlabeled data to improve supervised learning for text summarization
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Improving hierarchical text classification using unlabeled data
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Machine Learning for User Modeling
User Modeling and User-Adapted Interaction
Automatic Text Summarization Using Unsupervised and Semi-supervised Learning
PKDD '01 Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery
Managing Diagnostic Knowledge in Text Cases
ICCBR '01 Proceedings of the 4th International Conference on Case-Based Reasoning: Case-Based Reasoning Research and Development
Machine Learning for Intelligent Information Access
Machine Learning and Its Applications, Advanced Lectures
Interact: A Staged Approach to Customer Service Automation
AI '00 Proceedings of the 13th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence
Positive and Unlabeled Examples Help Learning
ALT '99 Proceedings of the 10th International Conference on Algorithmic Learning Theory
Extracting Information from the Web for Concept Learning and Collaborative Filtering
ALT '00 Proceedings of the 11th International Conference on Algorithmic Learning Theory
Learning domain-independent string transformation weights for high accuracy object identification
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining newsgroups using networks arising from social behavior
WWW '03 Proceedings of the 12th international conference on World Wide Web
Interactive Improvisational Music Companionship: A User-Modeling Approach
User Modeling and User-Adapted Interaction
Learning with progressive transductive support vector machine
Pattern Recognition Letters
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Distributional word clusters vs. words for text categorization
The Journal of Machine Learning Research
Clustering documents in a web directory
WIDM '03 Proceedings of the 5th ACM international workshop on Web information and data management
Clinical and financial outcomes analysis with existing hospital patient records
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Bootstrapping for hierarchical document classification
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
On Using Partial Supervision for Text Categorization
IEEE Transactions on Knowledge and Data Engineering
Effect of term distributions on centroid-based text categorization
Information Sciences—Informatics and Computer Science: An International Journal - Special issue: Informatics and computer science intelligent systems applications
Automatic text categorization by unsupervised learning
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Dominant meanings classification model for web information
Design and application of hybrid intelligent systems
Scaling to very very large corpora for natural language disambiguation
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Using artificial anomalies to detect unknown and known network intrusions
Knowledge and Information Systems
Clustering documents into a web directory for bootstrapping a supervised classification
Data & Knowledge Engineering - Special issue: WIDM 2003
Unsupervised named-entity extraction from the web: an experimental study
Artificial Intelligence
Virtual examples for text classification with Support Vector Machines
EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Predicting reading difficulty with statistical language models
Journal of the American Society for Information Science and Technology
Text clustering with extended user feedback
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Reducing the human overhead in text categorization
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
A New Text Categorization Technique Using Distributional Clustering and Learning Logic
IEEE Transactions on Knowledge and Data Engineering
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Simple, robust, scalable semi-supervised learning via expectation regularization
Proceedings of the 24th international conference on Machine learning
On the strength of hyperclique patterns for text categorization
Information Sciences: an International Journal
Semi-supervised classification with hybrid generative/discriminative methods
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Software quality estimation with limited fault data: a semi-supervised learning perspective
Software Quality Control
An integrated system for building enterprise taxonomies
Information Retrieval
Using unlabeled data to handle domain-transfer problem of semantic detection
Proceedings of the 2008 ACM symposium on Applied computing
The value of agreement a new boosting algorithm
Journal of Computer and System Sciences
Text classification from unlabeled documents with bootstrapping and feature projection techniques
Information Processing and Management: an International Journal
Protein functional class prediction with a combined graph
Expert Systems with Applications: An International Journal
Non-negative matrix factorization for semi-supervised data clustering
Knowledge and Information Systems
Classification techniques with minimal labelling effort and application to medical reports
International Journal of Data Mining and Bioinformatics
Kernel-Based Transductive Learning with Nearest Neighbors
APWeb/WAIM '09 Proceedings of the Joint International Conferences on Advances in Data and Web Management
Adapting Naive Bayes to Domain Adaptation for Sentiment Analysis
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
International Journal of Approximate Reasoning
Soft-supervised learning for text classification
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Journal of Artificial Intelligence Research
Unsupervised named-entity extraction from the Web: An experimental study
Artificial Intelligence
CEC'09 Proceedings of the Eleventh conference on Congress on Evolutionary Computation
Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data
The Journal of Machine Learning Research
Automatic taxonomy generation: issues and possibilities
IFSA'03 Proceedings of the 10th international fuzzy systems association World Congress conference on Fuzzy sets and systems
Content-based recommendation systems
The adaptive web
Multiple label text categorization on a hierarchical thesaurus
EUROCAST'07 Proceedings of the 11th international conference on Computer aided systems theory
AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Combining coregularization and consensus-based self-training for multilingual text categorization
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Mixture model based label association techniques for web accessibility
UIST '10 Proceedings of the 23nd annual ACM symposium on User interface software and technology
Weakly supervised classification of objects in images using soft random forests
ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Entity disambiguation with hierarchical topic models
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Semi-Supervised Learning with Measure Propagation
The Journal of Machine Learning Research
Distributional features for text categorization
ECML'06 Proceedings of the 17th European conference on Machine Learning
Comparison of documents classification techniques to classify medical reports
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Learning to separate text content and style for classification
AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
The value of agreement, a new boosting algorithm
COLT'05 Proceedings of the 18th annual conference on Learning Theory
Class normalization in centroid-based text categorization
Information Sciences: an International Journal
Semi-supervised linear discriminant analysis using moment constraints
PSL'11 Proceedings of the First IAPR TC3 conference on Partially Supervised Learning
Learning structural dependencies of words in the Zipfian tail
IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
A global-ranking local feature selection method for text categorization
Expert Systems with Applications: An International Journal
Building high-performance classifiers using positive and unlabeled examples for text classification
ISNN'12 Proceedings of the 9th international conference on Advances in Neural Networks - Volume Part II
Constrained log-likelihood-based semi-supervised linear discriminant analysis
SSPR'12/SPR'12 Proceedings of the 2012 Joint IAPR international conference on Structural, Syntactic, and Statistical Pattern Recognition
Classifying unlabeled short texts using a fuzzy declarative approach
Language Resources and Evaluation
Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Researcher homepage classification using unlabeled data
Proceedings of the 22nd international conference on World Wide Web
Towards anytime active learning: interrupting experts to reduce annotation costs
Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics
On Knowledge-Enhanced Document Clustering
International Journal of Information Retrieval Research
Proceedings of the 7th ACM international conference on Web search and data mining
Unlabeling data can improve classification accuracy
Pattern Recognition Letters
Semi-supervised linear discriminant analysis through moment-constraint parameter estimation
Pattern Recognition Letters
Hi-index | 0.00 |
In many important text classification problems, acquiring class labels for training documents is costly, while gathering large quantities of unlabeled data is cheap. This paper shows that the accuracy of text classifiers trained with a small number of labeled documents can be improved by augmenting this small training set with a large pool of unlabeled documents. We present a theoretical argument showing that, under common assumptions, unlabeled data contain information about the target function. We then introduce an algorithm for learning from labeled and unlabeled text based on the combination of Expectation-Maximization with a naive Bayes classifier. The algorithm first trains a classifier using the available labeled documents, and probabilistically labels the unlabeled documents; it then trains a new classifier using the labels for all the documents, and iterates to convergence. Experimental results, obtained using text from three different realworld tasks, show that the use of unlabeled data reduces classification error by up to 33%.