KCAM: concentrating on structural similarity for XML fragments

Authors:
Lingbo Kong;Shiwei Tang;Dongqing Yang;Tengjiao Wang;Jun Gao
Affiliations:
Department of Computer Science and Technology, Peking University, Beijing, China;Department of Computer Science and Technology, Peking University, Beijing, China;Department of Computer Science and Technology, Peking University, Beijing, China;Department of Computer Science and Technology, Peking University, Beijing, China;Department of Computer Science and Technology, Peking University, Beijing, China
Venue:
WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
Year:
2006

Citing 12
Cited 0

Structured information retrieval in XML documents

Proceedings of the 2002 ACM symposium on Applied computing
Modern Information Retrieval

Modern Information Retrieval
Storing and querying ordered XML using a relational database system

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Approximate XML joins

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Searching and Browsing Collections of Structural Information

ADL '00 Proceedings of the IEEE Advances in Digital Libraries 2000
XRANK: ranked keyword search over XML documents

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
A bag of paths model for measuring structural similarity in Web documents

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Content and structure in indexing and ranking XML

Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Efficient keyword search for smallest LCAs in XML databases

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Similarity evaluation on tree-structured data

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Approximate matching of hierarchical data using pq-grams

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Structure and content scoring for XML

VLDB '05 Proceedings of the 31st international conference on Very large data bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a new method, KCAM, to measure the structural similarity of XML fragments satisfying given keywords. Its name is derived directly after the key structure in this method, Keyword Common Ancestor Matrix. One KCAM for one XML fragment is a k × k upper triangle matrix. Each element ai, j stores the level information of the SLCA (Smallest Lowest Common Ancestor) node corresponding to the keywords ki, kj. The matrix distance between KCAMs, denoted as KDist(), can be used as the approximate structural similarity. KCAM is independent of label information in fragments. It is powerful to distinguish the structural difference between XML fragments.