Improving Generalization with Active Learning
Machine Learning - Special issue on structured connectionist systems
Machine learning, neural and statistical classification
Machine learning, neural and statistical classification
Fast training of support vector machines using sequential minimal optimization
Advances in kernel methods
Estimating the Predictive Accuracy of a Classifier
EMCL '01 Proceedings of the 12th European Conference on Machine Learning
Meta-Learning by Landmarking Various Learning Algorithms
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Shallow parsing with conditional random fields
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Efficient inference on sequence segmentation models
ICML '06 Proceedings of the 23rd international conference on Machine learning
Detecting Fractures in Classifier Performance
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Semi-Supervised Learning
Transfer of Supervision for Improved Address Standardization
ICPR '10 Proceedings of the 2010 20th International Conference on Pattern Recognition
Hi-index | 0.01 |
Rule based systems for processing text data encode the knowledge of a human expert into a rule base to take decisions based on interactions of the input data and the rule base. Similarly, supervised learning based systems can learn patterns present in a given dataset to make decisions on similar and other related data. Performances of both these classes of models are largely dependent on the training examples seen by them, based on which the learning was performed. Even though trained models might fit well on training data, the accuracies they yield on a new test data may be considerably different. Computing the accuracy of the learnt models on new unlabeled datasets is a challenging problem requiring costly labeling, and which is still likely to only cover a subset of the new data because of the large sizes of datasets involved. In this paper, we present a method to estimate the accuracy of a given model on a new dataset without manually labeling the data. We verify our method on large datasets for two shallow text processing tasks: document classification and postal address segmentation, and using both supervised machine learning methods and human generated rule based models.