Enhanced hypertext categorization using hyperlinks
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The XML handbook
Fast and effective text mining using linear-time document clustering
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
A classifier for semi-structured documents
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Object Exchange Across Heterogeneous Information Sources
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Term Weighting Approaches in Automatic Text Retrieval
Term Weighting Approaches in Automatic Text Retrieval
Proceedings of the 15th international conference on World Wide Web
Manual and evolutionary equalization in text mining
SMO'07 Proceedings of the 7th WSEAS International Conference on Simulation, Modelling and Optimization
XML Document Classification Using Extended VSM
Focused Access to XML Documents
Semantic clustering of XML documents
ACM Transactions on Information Systems (TOIS)
Extended VSM for XML document classification using frequent subtrees
INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
Structure and content similarity for clustering XML documents
WAIM'10 Proceedings of the 2010 international conference on Web-age information management
Clust-XPaths: clustering of XML paths
MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
PKU at INEX 2010 XML mining track
INEX'10 Proceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrieval
A flexible structured-based representation for XML document mining
INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
X-Class: Associative Classification of XML Documents by Structure
ACM Transactions on Information Systems (TOIS)
Hi-index | 0.00 |
A semi-structured document has more structured information compared to an ordinary document, and the relation among semi-structured documents can be fully utilized. In order to take advantage of the structure and link information in a semi-structured document for better mining, a structured link vector model (SLVM) is presented in this paper, where a vector represents a document, and vectors' elements are determined by terms, document structure and neighboring documents. Text mining based on SLVM is described in the procedure of K-means for briefness and clarity: calculating document similarity and calculating cluster center. The clustering based on SLVM performs significantly better than that based on a conventional vector space model in the experiments, and its F value increases from 0.65-0.73 to 0.82-0.86.