Indexing and retrieval of scientific literature
Proceedings of the eighth international conference on Information and knowledge management
Configuration REcognition Model for Complex Reverse Engineering Methods: 2(CREM)
DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
AIDAS: Incremental Logical Structure Discovery in PDF Documents
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Logical Labeling Using Bayesien Networks
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Extraction, layout analysis and classification of diagrams in PDF documents
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Arabic Newspaper Page Segmentation
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Creating reusable well-structured PDF as a sequence of component object graphic (COG) elements
Proceedings of the 2003 ACM symposium on Document engineering
Xed: A New Tool for eXtracting Hidden Structures from Electronic Documents
DIAL '04 Proceedings of the First International Workshop on Document Image Analysis for Libraries (DIAL'04)
Creating structured PDF files using XML templates
Proceedings of the 2004 ACM symposium on Document engineering
From Searching to Browsing through Multimodal Documents Linking
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Capturing the Layout of Electronic Documents for Reuse in Variable Data Printing
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Towards a Canonical and Structured Representation of PDF Documents through Reverse Engineering
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Browsing multimedia archives through intra- and multimodal cross-documents links
MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
Browsing recorded meetings with ferret
MLMI'04 Proceedings of the First international conference on Machine Learning for Multimodal Interaction
Print-n-link: weaving the paper web
Proceedings of the 2006 ACM symposium on Document engineering
A model for mapping between printed and digital document instances
Proceedings of the 2007 ACM symposium on Document engineering
Improving XED for extracting content from Arabic PDFs
DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Reengineering PDF-based documents targeting complex software specifications
International Journal of Knowledge and Web Intelligence
Hi-index | 0.00 |
Accessing the structured content of PDF document is a difficult task, requiring pre-processing and reverse engineering techniques. In this paper, we first present different methods to accomplish this task, which are based either on document image analysis, or on electronic content extraction. Then, XCDF, a canonical format with well-defined properties is proposed as a suitable solution for representing structured electronic documents and as an entry point for further researches and works. The system and methods used for reverse engineering PDF document into this canonical format are also presented. We finally present current applications of this work into various domains, spacing from data mining to multimedia navigation, and consistently benefiting from our canonical format in order to access PDF document content and structures.