Representations of quasi-Newton matrices and their use in limited memory methods
Mathematical Programming: Series A and B
Segmentation of page images using the area Voronoi diagram
Computer Vision and Image Understanding - Special issue on document image understanding and retrieval
Empirical Performance Evaluation Methodology and Its Application to Page Segmentation Algorithms
IEEE Transactions on Pattern Analysis and Machine Intelligence
Optical Character Recognition: An Illustrated Guide to the Frontier
Optical Character Recognition: An Illustrated Guide to the Frontier
Automating the Construction of Internet Portals with Machine Learning
Information Retrieval
The Document Spectrum for Page Layout Analysis
IEEE Transactions on Pattern Analysis and Machine Intelligence
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Maximum Entropy Markov Models for Information Extraction and Segmentation
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Table extraction using conditional random fields
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to recognize tables in free text
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Shallow parsing with conditional random fields
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Unsupervised named-entity extraction from the web: an experimental study
Artificial Intelligence
Conditional Random Fields for Contextual Human Motion Recognition
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Named Entity Extraction using AdaBoost
COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Named entity recognition through classifier combination
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Is there a grand challenge or X-prize for data mining?
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Multiscale conditional random fields for image labeling
CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
Foundations and Trends in Databases
Language identification for handwritten document images using a shape codebook
Pattern Recognition
Hi-index | 0.00 |
Expense reimbursement is a time-consuming and labor-intensive process across organizations. In this paper, we present a prototype expense reimbursement system that dramatically reduces the elapsed time and costs involved, by eliminating paper from the process life cycle. Our complete solution involves (1) an electronic submission infrastructure that provides multi- channel image capture, secure transport and centralized storage of paper documents; (2) an unconstrained data mining approach to extracting relevant named entities from un-structured document images; (3) automation of auditing procedures that enables automatic expense validation with minimum human interaction. Extracting relevant named entities robustly from document images with unconstrained layouts and diverse formatting is a fundamental technical challenge to image-based data mining, question answering, and other information retrieval tasks. In many applications that require such capability, applying traditional language modeling techniques to the stream of OCR text does not give satisfactory result due to the absence of linguistic context. We present an approach for extracting relevant named entities from document images by combining rich page layout features in the image space with language content in the OCR text using a discriminative conditional random field (CRF) framework. We integrate this named entity extraction engine into our expense reimbursement solution and evaluate the system performance on large collections of real-world receipt images provided by IBM World Wide Reimbursement Center.