Well-established document interchange formats
Proceedings of the International Conference on Electronic Publishing on Document manipulation and typography
Multi-media RISC informatics: retrieving information with simple structural components
CIKM '93 Proceedings of the second international conference on Information and knowledge management
Customizing information capture and access
ACM Transactions on Information Systems (TOIS)
DL '97 Proceedings of the second ACM international conference on Digital libraries
TINTIN: a system for retrieval in text tables
DL '97 Proceedings of the second ACM international conference on Digital libraries
Document Representation and Its Application to Page Decomposition
IEEE Transactions on Pattern Analysis and Machine Intelligence
A multiresolution approach for page segmentation
Pattern Recognition Letters
Proceedings of the eighth international conference on Information and knowledge management
TextFinder: An Automatic System to Detect and Recognize Text In Images
IEEE Transactions on Pattern Analysis and Machine Intelligence
Integrating geometrical and linguistic analysis for email signature block parsing
ACM Transactions on Information Systems (TOIS)
Machine Learning for Intelligent Processing of Printed Documents
Journal of Intelligent Information Systems - Special issue on methodologies for intelligent information systems
Geometric Structure Analysis of Document Images: A Knowledge-Based Approach
IEEE Transactions on Pattern Analysis and Machine Intelligence
Empirical Performance Evaluation Methodology and Its Application to Page Segmentation Algorithms
IEEE Transactions on Pattern Analysis and Machine Intelligence
Structural Queries in Electronic Corpora
Multimedia Tools and Applications
Syntactic Segmentation and Labeling of Digitized Pages from Technical Journals
IEEE Transactions on Pattern Analysis and Machine Intelligence
The Document Spectrum for Page Layout Analysis
IEEE Transactions on Pattern Analysis and Machine Intelligence
Symbolic Learning Techniques in Paper Document Processing
MLDM '99 Proceedings of the First International Workshop on Machine Learning and Data Mining in Pattern Recognition
DAN: An Automatic Segmentation and Classification Engine for Paper Documents
DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Induction of Recursive Theories in the Normal ILP Setting: Issues and Solutions
ILP '00 Proceedings of the 10th International Conference on Inductive Logic Programming
Multiscale Segmentation of Document Images Using M -Band Wavelets
CAIP '01 Proceedings of the 9th International Conference on Computer Analysis of Images and Patterns
Bibliographic attribute extraction from erroneous references based on a statistical model
Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Text-mining based journal splitting
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Structure analysis and generation for internet documents
Intelligent exploration of the web
Logical Structure Analysis and Generation for Structured Documents: A Syntactic Approach
IEEE Transactions on Knowledge and Data Engineering
Analysis and Conversion of Documents
ICPR '98 Proceedings of the 14th International Conference on Pattern Recognition-Volume 2 - Volume 2
Capturing the Layout of Electronic Documents for Reuse in Variable Data Printing
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Combining DOM tree and geometric layout analysis for online medical journal article segmentation
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Visual similarity based document layout analysis
Journal of Computer Science and Technology - Special section on China AVS standard
Learning Recursive Theories in the Normal ILP Setting
Fundamenta Informaticae
Document page segmentation using neuro-fuzzy approach
Applied Soft Computing
Crf-based authors' name tagging for scanned documents
Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
RELATIONAL DATA MINING AND ILP FOR DOCUMENT IMAGE UNDERSTANDING
Applied Artificial Intelligence
A Figure Image Processing System
Graphics Recognition. Recent Advances and New Opportunities
Tools for monitoring, visualizing, and refining collections of noisy documents
Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data
Indexing by permeability in block structured web pages
Proceedings of the 9th ACM symposium on Document engineering
Page segmentation using texture analysis
Pattern Recognition
Multi-oriented english text line identification
SCIA'03 Proceedings of the 13th Scandinavian conference on Image analysis
Text versus non-text distinction in online handwritten documents
Proceedings of the 2010 ACM Symposium on Applied Computing
Decomposing document images by heuristic search
EMMCVPR'07 Proceedings of the 6th international conference on Energy minimization methods in computer vision and pattern recognition
Semi-supervised learning for text-line detection
Pattern Recognition Letters
Context-aware and content-based dynamic Voronoi page segmentation
DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Analysis and taxonomy of column header categories for web tables
DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Associating figures with descriptions for patent documents
DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
An intelligent method to extract characters in color document with highlight regions
IEA/AIE'11 Proceedings of the 24th international conference on Industrial engineering and other applications of applied intelligent systems conference on Modern approaches in applied intelligence - Volume Part II
Automatic localization of page segmentation errors
Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data
Script-agnostic reflow of text in document images
Proceedings of the 13th International Conference on Human Computer Interaction with Mobile Devices and Services
Learning to segment document images
PReMI'05 Proceedings of the First international conference on Pattern Recognition and Machine Intelligence
Applying preattentive visual guidance in document image analysis
IWICPAS'06 Proceedings of the 2006 Advances in Machine Vision, Image Processing, and Pattern Analysis international conference on Intelligent Computing in Pattern Analysis/Synthesis
Learning segmentation of documents with complex scripts
ICVGIP'06 Proceedings of the 5th Indian conference on Computer Vision, Graphics and Image Processing
Performance comparison of six algorithms for page segmentation
DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Geometric algorithms and experiments for automated document structuring
Mathematical and Computer Modelling: An International Journal
Learning Recursive Theories in the Normal ILP Setting
Fundamenta Informaticae
Automatic localization and correction of line segmentation errors
Proceeding of the workshop on Document Analysis and Recognition
Hi-index | 4.10 |
Gobbledoc, a system providing remote access to stored documents, which is based on syntactic document analysis and optical character recognition (OCR), is discussed. In Gobbledoc, image processing, document analysis, and OCR operations take place in batch mode when the documents are acquired. The document image acquisition process and the knowledge base that must be entered into the system to process a family of page images are described. The process by which the X-Y tree data structure converts a 2-D page-segmentation problem into a series of 1-D string-parsing problems that can be tackled using conventional compiler tools is also described. Syntactic analysis is used in Gobbledoc to divide each page into labeled rectangular blocks. Blocks labeled text are converted by OCR to obtain a secondary (ASCII) document representation. Since such symbolic files are better suited for computerized search than for human access to the document content and because too many visual layout clues are lost in the OCR process (including some special characters), Gobbledoc preserves the original block images for human browsing. Storage, networking, and display issues specific to document images are also discussed.