Identifying and Merging Related Bibliographic Records

Authors:
J. A. Hylton
Affiliations:
-
Venue:
Identifying and Merging Related Bibliographic Records
Year:
1996

Citing 0
Cited 8

Learning domain-independent string transformation weights for high accuracy object identification

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient similarity-based operations for data integration

Data & Knowledge Engineering
Comparative study of name disambiguation problem using a scalable blocking-based framework

Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Knowledge Accumulation and Resolution of Data Inconsistencies during the Integration of Microbial Information Sources

IEEE Transactions on Knowledge and Data Engineering
Effective and scalable solutions for mixed and split citation problems in digital libraries

Proceedings of the 2nd international workshop on Information quality in information systems
An intelligent speech interface for personal assistants applied to knowledge management

Web Intelligence and Agent Systems
Bottom-Up Extraction and Trust-Based Refinement of Ontology Metadata

IEEE Transactions on Knowledge and Data Engineering
Efficient Identification of Duplicate Bibliographical References

Proceedings of the 2005 conference on Advances in Logic Based Intelligent Systems: Selected Papers of LAPTEC 2005

Quantified Score

Hi-index	0.00

Visualization

Abstract

Bibliographic records freely available on the Internet can be used to construct a high-quality, digital finding aid that provides the ability to discover paper and electronic documents. The key challenge to providing such a service is integrating mixed-quality bibliographic records, coming from multiple sources and in multiple formats. This thesis describes an algorithm that automatically identifies records that refer to the same work and clusters them together; the algorithm clusters records for which both author and title match. It tolerates errors and cataloging variations within the records by using a full-text search engine and an $n$-gram-based approximate string matching algorithm to build the clusters. The algorithm identifies more than 90 percent of the related records and includes incorrect records in less than 1 percent of the clusters. It has been used to construct a 250,000-record collection of the computer science literature. This thesis also presents preliminary work on automatic linking between bibliographic records and copies of documents available on the Internet.