Similarity Evaluation of XML Documents Based on Weighted Element Tree Model
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Similarity computation for XML documents by XML element sequence patterns
APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
Structure and content similarity for clustering XML documents
WAIM'10 Proceedings of the 2010 international conference on Web-age information management
WSEAS Transactions on Computers
Mining frequent association tag sequences for clustering XML documents
APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Structural similarity evaluation of XML documents based on basic statistics
WISM'12 Proceedings of the 2012 international conference on Web Information Systems and Mining
Combining structure and content similarities for XML document clustering
AusDM '08 Proceedings of the 7th Australasian Data Mining Conference - Volume 87
Hi-index | 0.00 |
XML is becoming a common way of storing data. The elements and their arrangement in the document's hierarchy not only describe the document structure but also imply the data's semantic meaning, and hence provide valuable information to develop tools for manipulating XML documents. In this paper, we pursue a data mining approach to the problem of XML document clustering. We introduce a novel XML structural representation called common XPath (CXP), which encodes the frequently occurring elements with the hierarchical information, and propose to take the CXPs mined to form the feature vectors for XML document clustering. In other words, data mining acts as a feature extractor in the clustering process. Based on this idea, we devise a path-based XML document clustering algorithm called PBClustering which groups the documents according to their CXPs, i.e. their frequent structures. Encouraging simulation results are observed and reported.