Classifying web data in directory structures

  • Authors:
  • Sofia Stamou;Alexandros Ntoulas;Vlassis Krikos;Pavlos Kokosis;Dimitris Christodoulakis

  • Affiliations:
  • Computer Engineering and Informatics Department, Patras University, Patras, Greece;Computer Science Department, University of California, Los Angeles (UCLA);Computer Engineering and Informatics Department, Patras University, Patras, Greece;Computer Engineering and Informatics Department, Patras University, Patras, Greece;Computer Engineering and Informatics Department, Patras University, Patras, Greece

  • Venue:
  • APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Web Directories have emerged as an alternative to the Search Engines for locating information on the Web. Typically, Web Directories rely on humans putting in significant time and effort into finding important pages on the Web and categorizing them in the Directory. In this paper, we experimentally study the automatic population of a Web Directory via the use of a subject hierarchy. For our study, we have constructed a subject hierarchy for the top level topics offered in Dmoz, by leveraging ontological content from available lexical resources. We first describe how we built our subject hierarchy. Then, we analytically present how the hierarchy can help in the construction of a Directory. We also introduce a ranking formula for sorting the pages listed in every Directory topic, based on the pages’ quality, and we experimentally study the efficiency of our approach against other popular methods for creating Directories.