PKU at INEX 2010 XML mining track

Authors:
Songlin Wang;Feng Liang;Jianwu Yang
Affiliations:
Institute of Computer Sci. & Tech., Peking University, Beijing, China;Institute of Computer Sci. & Tech., Peking University, Beijing, China;Institute of Computer Sci. & Tech., Peking University, Beijing, China
Venue:
INEX'10 Proceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrieval
Year:
2010

Citing 7
Cited 1

Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
A semi-structured document model for text mining

Journal of Computer Science and Technology
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Survey of Text Mining

Survey of Text Mining
Frequent Subtree Mining - An Overview

Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
Fast clustering algorithm for information organization

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
Extended VSM for XML document classification using frequent subtrees

INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval

Overview of the INEX 2010 XML mining track: clustering and classification of XML documents

INEX'10 Proceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents our participation in the INEX 2010 XML Mining track. Our classification and clustering solutions for XML documents have used both the structure and content information, where the frequent subtrees as structural units are used for content extraction from the XML document. In addition, we used the WordNet and the link information for better performance, and applied the structured link vector model for classification.