Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Text clustering with extended user feedback
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Toward harnessing user feedback for machine learning
Proceedings of the 12th international conference on Intelligent user interfaces
Active Learning with Feedback on Features and Instances
The Journal of Machine Learning Research
Interacting meaningfully with machine learning systems: Three experiments
International Journal of Human-Computer Studies
Corrective feedback and persistent learning for information extraction
Artificial Intelligence
Semi-supervised classification on evolutionary data
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
End-user feature labeling: a locally-weighted regression approach
Proceedings of the 16th international conference on Intelligent user interfaces
Hi-index | 0.00 |
Automatically processing production documents requires document type detection as well as data capture to find appropriate index data from a post-OCR representation of the document. While current learning-based methods perform quite well due to many similar documents created with the same template, their machine learning models require intense training and are hard to update frequently. We provide a method for continuously incorporating user feedback in a layout-based extraction process taking care of both immediate learning as well as limiting the size of the model. The method is evaluated on a tagged corpus of more than 5,000 business documents. It allows not only continuous re-training of the model thus adapting it to new document templates, but also starting from scratch with an empty model requiring less than 10% of the corpus as training documents to reach an accuracy measure of more than 80%.