Table of contents recognition for converting PDF documents in e-book formats
Proceedings of the 10th ACM symposium on Document engineering
Towards a faithful visualization of historical books on e-book readers
Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
Web-based citation parsing, correction and augmentation
Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Hi-index | 0.00 |
In this paper we analyze our recent research on the use of document analysis techniques for metadata extraction from PDF papers. We describe a package that is designed to extract basic metadata from these documents. The package is used in combination with a digital library software suite to easily build personal digital libraries. The proposed software is based on a suitable combination of several techniques that include PDF parsing, low level document image processing, and layout analysis. In addition, we use the information gathered from a widely known citation database (DBLP) to assist the tool in the difficult task of author identification. The system is tested on some paper collections selected from recent conference proceedings.