Finding a catalog: generating analytical catalog records from well-structured digital texts

Authors:
David Mimno;Alison Jones;Gregory Crane
Affiliations:
Tufts University, Medford, MA;Tufts University, Medford, MA;Tufts University, Medford, MA
Venue:
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Year:
2005

Citing 10
Cited 1

Automatic RDF metadata generation for resource discovery

WWW '99 Proceedings of the eighth international conference on World Wide Web
Knowledge-based metadata extraction from PostScript files

DL '00 Proceedings of the fifth ACM conference on Digital libraries
Revolutionizing name authority control

DL '00 Proceedings of the fifth ACM conference on Digital libraries
The open archives initiative: building a low-barrier interoperability framework

Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
Automatic document metadata extraction using support vector machines

Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Methods for precise named entity matching in digital collections

Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Distributed proofreading

Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Generating fuzzy semantic metadata describing spatial relations from images using the R-histogram

Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Metaextract: an NLP system to automatically assign metadata

Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Using a web-based categorization approach to generate thematic metadata from texts

ACM Transactions on Asian Language Information Processing (TALIP)

A new generation of textual corpora: mining corpora from very large collections

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the criticisms library users often make of catalogs is that they rarely include information below the bibliographic level. It is generally impossible to search a catalog for the titles and subjects of particular chapters or volumes. There has been no way to add this information to catalog records without exponentially increasing the workload of catalogers. At the same time, well-structured full-text XML transcriptions of printed works are becoming increasingly available. This paper describes how existing investments in full text digitization and structural markup combined with current named-entity extraction technology can efficiently generate the detailed level of catalog data that users want, at no significant additional cost. This system is demonstrated on an existing digital collection within the Perseus Digital Library.