COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
A sequential algorithm for training text classifiers
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Improving Generalization with Active Learning
Machine Learning - Special issue on structured connectionist systems
Selective Sampling Using the Query by Committee Algorithm
Machine Learning
IEEE Transactions on Pattern Analysis and Machine Intelligence
Making large-scale support vector machine learning practical
Advances in kernel methods
Machine Learning
Information fusion in biometrics
Pattern Recognition Letters - Special issue: Audio- and video-based biometric person authentication (AVBPA 2001)
Active learning with statistical models
Journal of Artificial Intelligence Research
Score normalization in multimodal biometric systems
Pattern Recognition
A classification approach to multi-biometric score fusion
AVBPA'05 Proceedings of the 5th international conference on Audio- and Video-Based Biometric Person Authentication
Application of majority voting to pattern recognition: an analysis of its behavior and performance
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Hi-index | 0.00 |
When digitizing a print bilingual dictionary, whether via optical character recognition or manual entry, it is inevitable that errors are introduced into the electronic version that is created. We investigate automating the process of detecting errors in an XML representation of a digitized print dictionary using a hybrid approach that combines rule-based, feature-based, and language model-based methods. We investigate combining methods and show that using random forests is a promising approach. We find that in isolation, unsupervised methods rival the performance of supervised methods. Random forests typically require training data so we investigate how we can apply random forests to combine individual base methods that are themselves unsupervised without requiring large amounts of training data. Experiments reveal empirically that a relatively small amount of data is sufficient and can potentially be further reduced through specific selection criteria.