Classification of XSLT-Generated web documents with support vector machines

Authors:
Atakan Kurt;Engin Tozal
Affiliations:
Computer Eng. Dept., Fatih University, Istanbul, Turkey;Computer Eng. Dept., Fatih University, Istanbul, Turkey
Venue:
KDXD'06 Proceedings of the First international conference on Knowledge Discovery from XML Documents
Year:
2006

Citing 12
Cited 4

The nature of statistical learning theory

The nature of statistical learning theory
Enhanced hypertext categorization using hyperlinks

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
A practical hypertext catergorization method using links and incrementally available class information

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
A classifier for semi-structured documents

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Web classification using support vector machine

Proceedings of the 4th international workshop on Web information and data management
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Machine Learning Approach to Web Mining

AI*IA '99 Proceedings of the 6th Congress of the Italian Association for Artificial Intelligence on Advances in Artificial Intelligence
Support Vector Machines for Text Categorization

HICSS '03 Proceedings of the 36th Annual Hawaii International Conference on System Sciences (HICSS'03) - Track 4 - Volume 4
Bayesian network model for semi-structured document classification

Information Processing and Management: an International Journal - Special issue: Bayesian networks and information retrieval
A web classification framework based on XSLT

APWeb'06 Proceedings of the 2006 international conference on Advanced Web and Network Technologies, and Applications

A bottom-up approach for XML documents classification

IDEAS '08 Proceedings of the 2008 international symposium on Database engineering & applications
A bayesian approach to classify conference papers

MICAI'06 Proceedings of the 5th Mexican international conference on Artificial Intelligence
Mining frequent association tag sequences for clustering XML documents

APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
X-Class: Associative Classification of XML Documents by Structure

ACM Transactions on Information Systems (TOIS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

XSLT is a transformation language mainly used for converting XML documents to HTML or other formats. Due to its simplicity and flexibility XML has replaced traditional EDI file formats. Most e-business applications store data in XML, convert XML into HTML using XSTL, and publish the HTML documents to the web. In this paper we argue that the use of XSLT presents an opportunity rather than a challenge to web document classification. We show that it is possible to combine the advantages of both HTML and XML into classification of documents at the XSLT transformation stage, named XSLT classification, to attain higher classification rates using Support Vector Machines (SVM). The results are both expected and promising. We believe that XSLT classification can become a favorable classification method over HTML or XML classification where XSLT stylesheets are available.