Semirings, automata, languages
Semirings, automata, languages
Xed: A New Tool for eXtracting Hidden Structures from Electronic Documents
DIAL '04 Proceedings of the First International Workshop on Document Image Analysis for Libraries (DIAL'04)
Structuring documents according to their table of contents
Proceedings of the 2005 ACM symposium on Document engineering
Optimized XY-Cut for Determining a Page Reading Order
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Job profiling in high performance printing
Proceedings of the 9th ACM symposium on Document engineering
FormSys: form-processing web services
Proceedings of the 19th international conference on World wide web
ISMIS'08 Proceedings of the 17th international conference on Foundations of intelligent systems
Improving XED for extracting content from Arabic PDFs
DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Structure extraction from PDF-based book documents
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Job profiling and queue management in high performance printing
Computer Science - Research and Development
Document understanding of graphical content in natively digital PDF documents
Proceedings of the 2012 ACM symposium on Document engineering
A practical method for compatibility evaluation of portable document formats
ACIIDS'13 Proceedings of the 5th Asian conference on Intelligent Information and Database Systems - Volume Part II
PDFX: fully-automated PDF-to-XML conversion of scientific literature
Proceedings of the 2013 ACM symposium on Document engineering
Hi-index | 0.00 |
We present in this paper a system for converting PDF legacy documents into structured XML format. This conversion system first extracts the different streams contained in PDF files (text, bitmap and vectorial images) and then applies different components in order to express in XML the logically structured documents. Some of these components are traditional in Document Analysis, other more specific to PDF. We also present a graphical user interface in order to check, correct and validate the analysis of the components. We eventually report on two real user cases where this system was applied on.