Extending metadata definitions by automatically extracting and organizing glossary definitions

Authors:
Eduard Hovy;Andrew Philpot;Judith Klavans;Ulrich Germann;Peter Davis;Samuel Popper
Affiliations:
University of Southern California, Marina del Rey, CA;University of Southern California, Marina del Rey, CA;Columbia University, MC, New York, NY;University of Southern California, Marina del Rey, CA;Columbia University, MC, New York, NY;Columbia University, MC, New York, NY
Venue:
dg.o '03 Proceedings of the 2003 annual national conference on Digital government research
Year:
2003

Citing 7
Cited 3

Memory and context for language interpretation

Memory and context for language interpretation
Selection and information: a class-based approach to lexical relationships

Selection and information: a class-based approach to lexical relationships
Electric words: dictionaries, computers, and meanings

Electric words: dictionaries, computers, and meanings
Flexible and scalable cost-based query planning in mediators: a transformational approach

Artificial Intelligence - Special issue on Intelligent internet systems
Extracting taxonomic relationships from on-line definitional sources using LEXING

Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
DGRC AskCal: natural language question answering for energy time series

dg.o '02 Proceedings of the 2002 annual national conference on Digital government research
Building a terminological database from heterogeneous definitional sources

dg.o '03 Proceedings of the 2003 annual national conference on Digital government research

Acquisition of OWL DL Axioms from Lexical Resources

ESWC '07 Proceedings of the 4th European conference on The Semantic Web: Research and Applications
Learning word-class lattices for definition and hypernym extraction

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
A new minimally-supervised framework for domain word sense disambiguation

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Metadata descriptions of database contents are required to build and use systems that access and deliver data in response to user requests. When numerous heterogeneous databases are brought together in a single system, their various metadata formalizations must be homogenized and integrated in order to support the access planning and delivery system. This integration is a tedious process that requires human expertise and attention. In this paper we describe a method of speeding up the formalization and integration of new metadata. The method takes advantage of the fact that databases are often described in web pages containing natural language glossaries that define pertinent aspects of the data. Given a root URL, our method identifies likely glossaries, extracts and formalizes aspects of relevant concepts defined in them, and automatically integrates the new formalized metadata concepts into a large model of the domain and associated conceptualizations.