Processing content-oriented XPath queries

Authors:
Börkur Sigurbjörnsson;Jaap Kamps;Maarten de Rijke
Affiliations:
University of Amsterdam, The Netherlands;University of Amsterdam, The Netherlands;University of Amsterdam, The Netherlands
Venue:
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Year:
2004

Citing 13
Cited 4

Effective retrieval of structured documents

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
A language for queries on structure and contents of textual databases

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Pivoted document length normalization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Effective site finding using link anchor information

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A study of smoothing methods for language models applied to Ad Hoc information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Querying and ranking XML documents

Journal of the American Society for Information Science and Technology - XML
Modern Information Retrieval

Modern Information Retrieval
Accelerating XPath location steps

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Searching XML documents via XML fragments

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Length normalization in XML retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Configurable indexing and ranking for XML information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Best-match querying from document-centric XML

Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Searching structured documents

Information Processing and Management: an International Journal

Structured queries in XML retrieval

Proceedings of the 14th ACM international conference on Information and knowledge management
Articulating information needs in XML query languages

ACM Transactions on Information Systems (TOIS)
Managing structured queries in probabilistic XML retrieval systems

Information Processing and Management: an International Journal
Mixture models, overlap, and structural hints in XML element retrieval

INEX'04 Proceedings of the Third international conference on Initiative for the Evaluation of XML Retrieval

Quantified Score

Hi-index	0.01

Visualization

Abstract

Document-centric XML collections contain text-rich documents, marked up with XML tags that add lightweight semantics to the text. Querying such collections calls for a hybrid query language: the text-rich nature of the documents suggests a content-oriented (IR) approach, while the mark-up allows users to add structural constraints to their IR queries. Hybrid queries tend to be more expressive, which should lead---in principle---to better retrieval performance. In practice, the processing of these hybrid queries within an IR systems turns out to be far from trivial, because a delicate balance between structural and content information needs to be sought. We propose an approach to processing such hybrid content-and-structure queries that decomposes a query into multiple content-only queries whose results are then combined in ways determined by the structural constraints of the original query. We evaluate our methods using the INEX 2003 test-suite, and show (1) that effective ways of processing of content-oriented XPath queries are non-trivial, (2) that there are differences in the effectiveness for different topics types, but (3) that with appropriate processing methods retrieval effectiveness can improve.