Extending metadata definitions by automatically extracting and organizing glossary definitions

  • Authors:
  • Eduard Hovy;Andrew Philpot;Judith Klavans;Ulrich Germann;Peter Davis;Samuel Popper

  • Affiliations:
  • University of Southern California, Marina del Rey, CA;University of Southern California, Marina del Rey, CA;Columbia University, MC, New York, NY;University of Southern California, Marina del Rey, CA;Columbia University, MC, New York, NY;Columbia University, MC, New York, NY

  • Venue:
  • dg.o '03 Proceedings of the 2003 annual national conference on Digital government research
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Metadata descriptions of database contents are required to build and use systems that access and deliver data in response to user requests. When numerous heterogeneous databases are brought together in a single system, their various metadata formalizations must be homogenized and integrated in order to support the access planning and delivery system. This integration is a tedious process that requires human expertise and attention. In this paper we describe a method of speeding up the formalization and integration of new metadata. The method takes advantage of the fact that databases are often described in web pages containing natural language glossaries that define pertinent aspects of the data. Given a root URL, our method identifies likely glossaries, extracts and formalizes aspects of relevant concepts defined in them, and automatically integrates the new formalized metadata concepts into a large model of the domain and associated conceptualizations.