Flexible queries over semistructured data
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
XIRQL: a query language for information retrieval in XML documents
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A survey in indexing and searching XML documents
Journal of the American Society for Information Science and Technology - XML
An expressive and efficient language for XML information retrieval
Journal of the American Society for Information Science and Technology - XML
Accelerating XPath location steps
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Schema-Driven Evaluation of Approximate Tree-Pattern Queries
EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Adding Structure to Unstructured Data
ICDT '97 Proceedings of the 6th International Conference on Database Theory
ATreeGrep: Approximate Searching in Unordered Trees
SSDBM '02 Proceedings of the 14th International Conference on Scientific and Statistical Database Management
Selected papers from the Third International Workshop WebDB 2000 on The World Wide Web and Databases
eXist: An Open Source Native XML Database
Revised Papers from the NODe 2002 Web and Database-Related Workshops on Web, Web-Services, and Database Systems
A survey of approaches to automatic schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
Adaptive Processing of Top-k Queries in XML
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Structure and content scoring for XML
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Deriving similarity for Semantic Web using similarity graph
Journal of Intelligent Information Systems
Approximate subtree identification in heterogeneous XML documents collections
XSym'05 Proceedings of the Third international conference on Database and XML Technologies
Designing Similarity Measures for XML
ER '08 Proceedings of the 27th International Conference on Conceptual Modeling
Effective XML content and structure retrieval with relevance ranking
Proceedings of the 18th ACM conference on Information and knowledge management
Requirements gathering in a model-based approach for the design of multi-similarity systems
Proceedings of the first international workshop on Model driven service engineering and data quality and security
The pq-gram distance between ordered labeled trees
ACM Transactions on Database Systems (TODS)
Effective pruning for XML structural match queries
Data & Knowledge Engineering
Graph homomorphism revisited for graph matching
Proceedings of the VLDB Endowment
On nonmetric similarity search problems in complex domains
ACM Computing Surveys (CSUR)
Building data warehouses with semantic web data
Decision Support Systems
Evaluating PageRank methods for structural sense ranking in labeled tree data
Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics
Exploring dictionary-based semantic relatedness in labeled tree data
Information Sciences: an International Journal
Hi-index | 0.00 |
Due to the heterogeneous nature of XML data for internet applications exact matching of queries is often inadequate. The need arises to quickly identify subtrees of XML documents in a collection that are similar to a given pattern. Similarity involves both tags, that are not required to coincide, and structure, in which not all the relationships among nodes in the tree structure are strictly preserved. In this paper we present an efficient approach to the identification of similar subtrees, relying on ad-hoc indexing structures. The approach allows to quickly detect, in a heterogeneous document collection, the minimal portions that exhibit some similarity with the pattern. These candidate portions are then ranked according to their actual similarity. The approach supports different notions of similarity, thus it can be customized to different application domains. In the paper, three different similarity measures are proposed and compared. The approach is experimentally validated and the experimental results are extensively discussed.