Proceedings of the International Conference on Electronic Publishing on Document manipulation and typography
Layout Recognition of Multi-Kinds of Table-Form Documents
IEEE Transactions on Pattern Analysis and Machine Intelligence
TINTIN: a system for retrieval in text tables
DL '97 Proceedings of the second ACM international conference on Digital libraries
Digital Document Processing
Modern Information Retrieval
Optical Character Recognition: An Illustrated Guide to the Frontier
Optical Character Recognition: An Illustrated Guide to the Frontier
Layout and Language: Preliminary Investigations in Recognizing the Structure of Tables
ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
User-Defined Template for Identifying Document Type and Extracting Information from Documents
ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Segmenting Documents Using Multiple Lexical Features
ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
On-Line Handwriting Recognition Based on Bigram Co-Occurrences
ICPR '02 Proceedings of the 16 th International Conference on Pattern Recognition (ICPR'02) Volume 3 - Volume 3
Model-based analysis of printed tables
ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1
Table Form Document Synthesis by Grammar-Based Structure Analysis
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Why Table Ground-Truthing is Hard
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Graphics Recognition. Recent Advances and New Opportunities
GREC'05 Proceedings of the 6th international conference on Graphics Recognition: ten Years Review and Future Perspectives
Hi-index | 0.00 |
Microfilm documents contain a wealth of information, but extracting and organizing this information by hand is slow, error-prone, and tedious. As an initial step toward automating access to this information, we describe in this paper an algorithmic process to automatically identify record patterns found in microfilm tables for pre-specified application domains. Our table-processing algorithm accepts an XML input file describing the individual cells of a table taken from a microfilm document, and finds for each record in the document the cells that together comprise the record. Two key features drive the algorithm: (1) geometric layout and (2) label matching with respect to a given domain-specific application ontology. The algorithm achieved an accuracy of 92% on our test corpus of genealogical microfilm tables.