Extraction, layout analysis and classification of diagrams in PDF documents
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Creating reusable well-structured PDF as a sequence of component object graphic (COG) elements
Proceedings of the 2003 ACM symposium on Document engineering
Xed: A New Tool for eXtracting Hidden Structures from Electronic Documents
DIAL '04 Proceedings of the First International Workshop on Document Image Analysis for Libraries (DIAL'04)
Creating structured PDF files using XML templates
Proceedings of the 2004 ACM symposium on Document engineering
Towards a Canonical and Structured Representation of PDF Documents through Reverse Engineering
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Dolores: An Interactive and Class-Free Approach for Document Logical Restructuring
DAS '08 Proceedings of the 2008 The Eighth IAPR International Workshop on Document Analysis Systems
OCD: An Optimized and Canonical Document Format
ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
A system for converting PDF documents into structured XML format
DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
XCDF: a canonical and structured document format
DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Hi-index | 0.00 |
PDF documents are widely used but the extraction and the manipulation and of their structured content is not an easy task. It requires sophisticated pre-processing and reverse engineering techniques to get such achievements. In this paper, we present an improvement of XED in order to handle unresolved issues related to the analysis of Arabic documents. A set of rules were proposed and implemented to enhance the extraction of Arabic content, by taking care of the different Arabic fonts, through mapping the un-interpreted Unicode values to the other interpreted sets as well as applying a reverse algorithm whenever needed. We finally expose concrete evaluations for the improvement of XED.