Semantic partitioning of web pages

  • Authors:
  • Srinivas Vadrevu;Fatih Gelgi;Hasan Davulcu

  • Affiliations:
  • Department of Computer Science and Engineering, Arizona State University, Tempe, AZ;Department of Computer Science and Engineering, Arizona State University, Tempe, AZ;Department of Computer Science and Engineering, Arizona State University, Tempe, AZ

  • Venue:
  • WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we describe the semantic partitioner algorithm, that uses the structural and presentation regularities of the Web pages to automatically transform them into hierarchical content structures. These content structures enable us to automatically annotate labels in the Web pages with their semantic roles, thus yielding meta-data and instance information for the Web pages. Experimental results with the TAP knowledge base and computer science department Web sites, comprising 16,861 Web pages indicate that our algorithm is able gather meta-data accurately from various types of Web pages. The algorithm is able to achieve this performance without any domain specific engineering requirement.