COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Classifying news stories using memory based reasoning
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Information-based objective functions for active data selection
Neural Computation
Automatic indexing based on Bayesian inference networks
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Automated learning of decision rules for text categorization
ACM Transactions on Information Systems (TOIS)
A sequential algorithm for training text classifiers
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Selective Sampling Using the Query by Committee Algorithm
Machine Learning
Making large-scale support vector machine learning practical
Advances in kernel methods
LAPACK Users' guide (third ed.)
LAPACK Users' guide (third ed.)
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Hierarchical Text Categorization Using Neural Networks
Information Retrieval
Query by committee, linear separation and random walks
Theoretical Computer Science
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Toward Optimal Active Learning through Sampling Estimation of Error Reduction
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Less is More: Active Learning with Support Vector Machines
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Query Learning with Large Margin Classifiers
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Support Vector Machine Active Learning with Application sto Text Classification
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Employing EM and Pool-Based Active Learning for Text Classification
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Using urls and table layout for web classification tasks
Proceedings of the 13th international conference on World Wide Web
Convex Optimization
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
An experimental study on large-scale web categorization
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Active learning with committees for text categorization
AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Learning the unified kernel machines for classification
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient training on biased minimax probability machine for imbalanced text classification
Proceedings of the 16th international conference on World Wide Web
Adaptive multiple feedback strategies for interactive video search
CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
A bayesian logistic regression model for active relevance feedback
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
On profiling blogs with representative entries
Proceedings of the second workshop on Analytics for noisy unstructured text data
Improving supervised learning performance by using fuzzy clustering method to select training data
Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology - Fuzzy theory and technology with applications
Semisupervised SVM batch mode active learning with applications to image retrieval
ACM Transactions on Information Systems (TOIS)
Information Retrieval
Active Learning Strategies for Multi-Label Text Classification
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
An intrinsic stopping criterion for committee-based active learning
CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
On privacy preservation in text and document-based active learning for named entity recognition
Proceedings of the ACM first international workshop on Privacy and anonymity for very large databases
Centrality Measures from Complex Networks in Active Learning
DS '09 Proceedings of the 12th International Conference on Discovery Science
New filtering approaches for phishing email
Journal of Computer Security - EU-Funded ICT Research on Trust and Security
Batch mode active learning based multi-view text classification
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 7
SED: supervised experimental design and its application to text classification
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
d-Confidence: an active learning strategy which efficiently identifies small classes
ALNLP '10 Proceedings of the NAACL HLT 2010 Workshop on Active Learning for Natural Language Processing
An effective procedure exploiting unlabeled data to build monitoring system
Expert Systems with Applications: An International Journal
VisionGo: Towards video retrieval with joint exploration of human and computer
Information Sciences: an International Journal
Optimal batch selection for active learning in multi-label classification
MM '11 Proceedings of the 19th ACM international conference on Multimedia
Batch Mode Active Learning for Networked Data
ACM Transactions on Intelligent Systems and Technology (TIST)
A weakly-supervised approach to argumentative zoning of scientific documents
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A utility-theoretic ranking method for semi-automated text classification
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Active hashing and its application to image and text retrieval
Data Mining and Knowledge Discovery
Active learning for networked data based on non-progressive diffusion model
Proceedings of the 7th ACM international conference on Web search and data mining
Hi-index | 0.00 |
Large-scale text categorization is an important research topic for Web data mining. One of the challenges in large-scale text categorization is how to reduce the human efforts in labeling text documents for building reliable classification models. In the past, there have been many studies on applying active learning methods to automatic text categorization, which try to select the most informative documents for labeling manually. Most of these studies focused on selecting a single unlabeled document in each iteration. As a result, the text categorization model has to be retrained after each labeled document is solicited. In this paper, we present a novel active learning algorithm that selects a batch of text documents for labeling manually in each iteration. The key of the batch mode active learning is how to reduce the redundancy among the selected examples such that each example provides unique information for model updating. To this end, we use the Fisher information matrix as the measurement of model uncertainty and choose the set of documents to effectively maximize the Fisher information of a classification model. Extensive experiments with three different datasets have shown that our algorithm is more effective than the state-of-the-art active learning techniques for text categorization and can be a promising tool toward large-scale text categorization for World Wide Web documents.