CiteSeer: an automatic citation indexing system
Proceedings of the third ACM conference on Digital libraries
Making large-scale support vector machine learning practical
Advances in kernel methods
Knowledge-based metadata extraction from PostScript files
DL '00 Proceedings of the fifth ACM conference on Digital libraries
Localizing experience of digital content via structural metadata
Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
Automatic document metadata extraction using support vector machines
Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
DIAL '04 Proceedings of the First International Workshop on Document Image Analysis for Libraries (DIAL'04)
A service-oriented architecture for digital libraries
Proceedings of the 2nd international conference on Service oriented computing
Automatic extraction of titles from general documents using machine learning
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
The class imbalance problem: A systematic study
Intelligent Data Analysis
Automatic metadata generation for scanned scientific volumes
Proceedings of the 2008 ACM workshop on Research advances in large digital book repositories
Content integration in digital libraries
AMC '09 Proceedings of the 2009 workshop on Ambient media computing
Hi-index | 0.00 |
Large scale digitization projects have been conducted at digital libraries to preserve cultural artifacts and to provide permanent access. The increasing amount of digitized resources, including scanned books and scientific publications, requires development of tools and methods that will efficiently analyze and manage large collections of digitized resources. In this work, we tackle the problem of extracting metadata from scanned volumes of journals. Our goal is to extract information describing internal structures and content of scanned volumes, which is necessary for providing effective content access functionalities to digital library users. We propose methods for automatically generating volume level, issue level, and article level metadata based on format and text features extracted from OCRed text. We show the performance of our system on scanned bound historical documents nearly two centuries old. We have developed the system and integrated it into an operational digital library, the Internet Archive, for real-world usage.