Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
An Algorithm for Finding the Largest Approximately Common Substructures of Two Trees
IEEE Transactions on Pattern Analysis and Machine Intelligence
The String-to-String Correction Problem
Journal of the ACM (JACM)
The Tree-to-Tree Correction Problem
Journal of the ACM (JACM)
WWW '03 Proceedings of the 12th international conference on World Wide Web
Searching XML documents via XML fragments
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
XIRQL: An XML query language based on information retrieval concepts
ACM Transactions on Information Systems (TOIS)
The Importance of Length Normalization for XML Retrieval
Information Retrieval
TIJAH scratches INEX 2005: vague element selection, image search, overlap, and relevance feedback
INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
Narrowed extended XPath i (NEXI)
INEX'04 Proceedings of the Third international conference on Initiative for the Evaluation of XML Retrieval
Information retrieval of sequential data in heterogeneous XML databases
AMR'05 Proceedings of the Third international conference on Adaptive Multimedia Retrieval: user, context, and feedback
Binding Structural Properties to Node and Path Constraints in XML Path Retrieval
Advanced Internet Based Systems and Applications
Flexible document-query matching based on a probabilistic content and structure score combination
Proceedings of the 2010 ACM Symposium on Applied Computing
XML information retrieval through tree edit distance and structural summaries
AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
DTD based costs for tree-edit distance in structured information retrieval
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Hi-index | 0.00 |
This paper reports on SIRIUS, a lightweight indexing and search engine for XML documents. The retrieval approach implemented is document oriented. It involves an approximate matching scheme of the structure and textual content. Instead of managing the matching of whole DOM trees, SIRIUS splits the documents object model in a set of paths. In this view, the request is a path-like expression with conditions on the attribute values. In this paper, we present the main functionalities and characteristics of this XML IR system and second we relate on our experience on adapting and using it for the INEX 2005 ad-hoc retrieval task. Finally, we present and analyze the SIRIUS retrieval performance obtained during the INEX 2005 evaluation campaign and show that despite the lightweight characteristics of SIRIUS we were able to retrieve highly relevant non overlapping XML elements and obtained quite good precision at low recall values.