A web classification framework based on XSLT

  • Authors:
  • Atakan Kurt;Engin Tozal

  • Affiliations:
  • Computer Eng. Dept., Fatih University, Istanbul, Turkey;Computer Eng. Dept., Fatih University, Istanbul, Turkey

  • Venue:
  • APWeb'06 Proceedings of the 2006 international conference on Advanced Web and Network Technologies, and Applications
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data on the web is gradually changing format from HTML to XML/XSLT driven by various software and hardware requirements such as interoperability and data-sharing problems between different applications/platforms, devices with vairous capabilities like cell phones, PDAs. This gradual change introduces new challenges in web page and web site classification. HTML is used for presentation of content. XML represents content in a hierarchical manner. XSLT is used to transform XML documents into different formats such as HTML, WML. There are certain drawbacks in HTML and XML classifications for classifying a web page. In this paper we propose a new classification method based on XSLT which is able to combine the advantages of HTML and XML classifications. We also introduce a web classification framework utilizing XSLT classification. Finally we show that using Naïve Bayes classifier XSLT classification outperfoms both HTML and XML classifications.