Discovering typical structures of documents: a road map approach
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Clustering transactions using large items
Proceedings of the eighth international conference on Information and knowledge management
ACM Computing Surveys (CSUR)
XClust: clustering XML schemas for effective integration
Proceedings of the eleventh international conference on Information and knowledge management
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth
Proceedings of the 17th International Conference on Data Engineering
Preparations for Semantics-Based XML Mining
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Optimized Substructure Discovery for Semi-structured Data
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
CLOPE: a fast and effective clustering algorithm for transactional data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
TreeFinder: a First Step towards XML Data Mining
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
A weighted common structure based clustering technique for XML documents
Journal of Systems and Software
Hi-index | 0.00 |
XML has recently become very popular for representing semi-structured data and a standard for data exchange over the web because of its varied applicability in a number of applications. Therefore, XML documents form an important data mining domain. In this paper, we propose a new XML document clustering technique using sequential pattern mining algorithm. Our approach first extracts the representative structures of frequent patterns from schemaless XML documents by using a sequential pattern mining algorithm. And then, unlike most previous document clustering methods, we apply clustering algorithm for transactional data without a measure of pairwise similarity, considering that an XML document as a transaction and the extracted frequent structures of documents as the items of the transaction. We have experimented our clustering algorithm by comparing it with the previous methods. The experimental results show the effectiveness of the proposed method in performance and in producing clusters with higher cluster cohesion.