Thematic segmentation of meetings through document/speech alignment
Proceedings of the 12th annual ACM international conference on Multimedia
Visual signature based identification of Low-resolution document images
Proceedings of the 2004 ACM symposium on Document engineering
Using bi-modal alignment and clustering techniques for documents and speech thematic segmentations
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Enhancing composite digital documents using XML-based standoff markup
Proceedings of the 2005 ACM symposium on Document engineering
Data categorization for a context return applied to logical document structure recognition
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Towards a Canonical and Structured Representation of PDF Documents through Reverse Engineering
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
DocMIR: An automatic document-based indexing system for meeting retrieval
Multimedia Tools and Applications
Visual Analytics: Combining Automated Discovery with Interactive Visualizations
DS '08 Proceedings of the 11th International Conference on Discovery Science
Object-level document analysis of PDF files
Proceedings of the 9th ACM symposium on Document engineering
Improving XED for extracting content from Arabic PDFs
DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Table of contents recognition for converting PDF documents in e-book formats
Proceedings of the 10th ACM symposium on Document engineering
Document resizing for visually impaired students
Proceedings of the 22nd Conference of the Computer-Human Interaction Special Interest Group of Australia on Computer-Human Interaction
Detection and resolution of references to meeting documents
MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
Recognition and classification of figures in PDF documents
GREC'05 Proceedings of the 6th international conference on Graphics Recognition: ten Years Review and Future Perspectives
Reengineering PDF-based documents targeting complex software specifications
International Journal of Knowledge and Web Intelligence
Using static documents as structured and thematic interfaces to multimedia meeting archives
MLMI'04 Proceedings of the First international conference on Machine Learning for Multimodal Interaction
Shallow dialogue processing using machine learning algorithms (or not)
MLMI'04 Proceedings of the First international conference on Machine Learning for Multimodal Interaction
A system for converting PDF documents into structured XML format
DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
XCDF: a canonical and structured document format
DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Newspaper article reconstruction using ant colony optimization and bipartite graph
Applied Soft Computing
Hi-index | 0.00 |
PDF became a very common format for exchanging printable documents. Further, it can be easily generated from the major documents formats, which make a huge number of PDF documents available over the net. However its use is limited to displaying and printing, which considerably reduces the search and retrieval capabilities. For this reason, additional tools have recently appeared that allow to extract the textual content. However their practical use is limited in the sense that the text's reading order is not necessary preserved, especially when handling multi-column documents, or in presence of complex layout. Our thesis is that those tools do not consider the hidden layout and logical structures of documents, which could greatly improve their results.We propose a novel approach to overcome the document content extraction, by merging a) low-level extraction methods applied on PDF files with b) layout analysis performed on a synthetically generated TIFF image. The paper describes the various steps necessary to achieve this task. Finally, we present a first experiment on the restitution of the newspapers' reading order which shows encouraging results.