Mining the Web for acronyms using the duality of patterns and relations

  • Authors:
  • Jeonghee Yi;Neel Sundaresan

  • Affiliations:
  • Computer Science, UCLA;IBM Almaden Research Center

  • Venue:
  • Proceedings of the 2nd international workshop on Web information and data management
  • Year:
  • 1999

Quantified Score

Hi-index 0.02

Visualization

Abstract

The Web is a rich source of information, but this information is scattered and hidden in the diversity of web pages. Search engines are windows to the web. However, the current search engines, designed to identify pages with specified phrases have very limited power. For example, they cannot search for phrases related in a particular way (e.g. books and their authors).In this paper we present a solution for identifying a set of inter-related information on the web using the duality concept. Duality problems arise when one tries to identify a pair of inter-related phrases such as (book, author), (name, email) or (acronym, expansion) relations. We propose a solution to this problem that iteratively refines mutually dependent approximations to their identifications. Specifically, we iteratively refine i) pairs of phrases related in a specific way, and ii) the patterns of their occurrences in web pages, i.e. the ways in which the related phrases are marked in the pages. We cast light on the general solution of the duality problems in the web by concentrating on one paradigmatic duality problem i.e. identifying (acronym, expansion) pairs in terms of the patterns of their occurrences in the web pages. The solution to this problem involves two mutually dependent duality problems of 1) the duality between the related pairs and their patterns, and 2) the duality between the related pairs and the acronym formulation rules.