Feature Matrix Extraction and Classification of XML Pages

Authors:
Hongcan Yan;Dianchuan Jin;Lihong Li;Baoxiang Liu;Yanan Hao
Affiliations:
School of Management, TianJin University, Tianjin, China 300072 and College of Sciences, HeBei Polytechnic University, Hebei, China 063009;College of Sciences, HeBei Polytechnic University, Hebei, China 063009;College of Sciences, HeBei Polytechnic University, Hebei, China 063009;College of Sciences, HeBei Polytechnic University, Hebei, China 063009;School of Computer Science and Mathematics, Victoria University, Australia
Venue:
Advanced Web and NetworkTechnologies, and Applications
Year:
2008

Citing 4
Cited 0

A classifier for semi-structured documents

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
XRules: an effective structural classifier for XML data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Bayesian network model for semi-structured document classification

Information Processing and Management: an International Journal - Special issue: Bayesian networks and information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the increasing data on the Web, the disadvantage of HTML is more and more evident. There must be a method which can separate data from display, and then XML (eXtensible Markup Language) arises. XML can be the main form of expressing and exchanging data. How to store, manage and use the data effectively have been problems needing to be solved in the field of Internet, in which the automatic text classification is an important one. In this article, we propose a data model to analyze documents using the hierarchical structure and keywords information. Experiments show the model has not only high accuracy, but also less time cost.