Mining the Web to Create Specialized Glossaries

  • Authors:
  • Paola Velardi;Roberto Navigli;Pierluigi D'Amadio

  • Affiliations:
  • University of Rome "La Sapienza.";University of Rome "La Sapienza.";IT consultant

  • Venue:
  • IEEE Intelligent Systems
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Glossaries are helpful for integrating information, reducing semantic heterogeneity, and facilitating communication between information systems. Commercial publishers charge lexicographers with building glossaries, but this isn't appropriate when a domain's semantics are continuously evolving rather than precisely characterized, as in emerging Web communities and interest groups. In emerging domains, glossary building is the cooperative effort of a team of domain experts. It involves several steps, including identifying the domain-relevant terminology, defining each term, and harmonizing the results. This is a time-consuming, costly process that often requires support from a collaborative platform to facilitate shared decisions and validation. TermExtractor and GlossExtractor, two Web applications based on Web mining techniques, support this complete glossary-building procedure. The tools exploit the Web's evolving nature, allowing one to continually update the emerging community's vocabulary. TermExtractor and GlossExtractor, which were used in the European project Interop, are freely available and are being used in experiments in different domains across the world. This article is part of a special issue on Natural Language Processing and the Web.