Web directory construction using lexical chains

  • Authors:
  • Sofia Stamou;Vlassis Krikos;Pavlos Kokosis;Alexandros Ntoulas;Dimitris Christodoulakis

  • Affiliations:
  • Computer Technology Institute, Computer Engineering Department, Patras University, Patras, Greece;Computer Technology Institute, Computer Engineering Department, Patras University, Patras, Greece;Computer Technology Institute, Computer Engineering Department, Patras University, Patras, Greece;Computer Science Department, University of California, Los Angeles;Computer Technology Institute, Computer Engineering Department, Patras University, Patras, Greece

  • Venue:
  • NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
  • Year:
  • 2005

Quantified Score

Hi-index 0.02

Visualization

Abstract

Web Directories provide a way of locating relevant information on the Web. Typically, Web Directories rely on humans putting in significant time and effort into finding important pages on the Web and categorizing them in the Directory. In this paper we present a way for automating the creation of a Web Directory. At a high level, our method takes as input a subject hierarchy and a collection of pages. We first leverage a variety of lexical resources from the Natural Language Processing community to enrich our hierarchy. After that, we process the pages and identify sequences of important terms, which are referred to as lexical chains. Finally, we use the lexical chains in order to decide where in the enriched subject hierarchy we should assign every page. Our experimental results with real Web data show that our method is quite promising into assisting humans during page categorization.