Extended VSM for XML document classification using frequent subtrees

Authors:
Jianwu Yang;Songlin Wang
Affiliations:
Institute of Computer Sci. & Tech., Peking University, Beijing, China;Institute of Computer Sci. & Tech., Peking University, Beijing, China
Venue:
INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
Year:
2009

Citing 10
Cited 3

The nature of statistical learning theory

The nature of statistical learning theory
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
A semi-structured document model for text mining

Journal of Computer Science and Technology
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
SVMTorch: support vector machines for large-scale regression problems

The Journal of Machine Learning Research
Survey of Text Mining

Survey of Text Mining
Frequent Subtree Mining - An Overview

Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
XML Document Classification Using Extended VSM

Focused Access to XML Documents
Learning element similarity matrix for semi-structured document analysis

Knowledge and Information Systems

Overview of the INEX 2009 XML mining track: clustering and classification of XML documents

INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
PKU at INEX 2010 XML mining track

INEX'10 Proceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrieval
X-Class: Associative Classification of XML Documents by Structure

ACM Transactions on Information Systems (TOIS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Structured link vector model (SLVM) is a representation proposed for modeling XML documents, which was extended from the conventional vector space model (VSM) by incorporating document structures. In this paper, we describe the classification approach for XML documents based on SLVM in the Document Mining Challenge of INEX 2009, where the closed frequent subtrees as structural units are used for content extraction from the XML document and the Chi-square test is used for feature selection.