Hybrid XML Retrieval: Combining Information Retrieval and a Native XML Database

Authors:
Jovan Pehcevski;James A. Thom;Anne-Marie Vercoustre
Affiliations:
RMIT University, Melbourne, Australia;RMIT University, Melbourne, Australia;INRIA, Rocquencourt, France
Venue:
Information Retrieval
Year:
2005

Citing 15
Cited 7

Effective retrieval of structured documents

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Pivoted document length normalization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
XIRQL: a query language for information retrieval in XML documents

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
The Index-Based XXL Search Engine for Querying XML Data with Relevance Ranking

EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Focussed Structured Document Retrieval

SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
eXist: An Open Source Native XML Database

Revised Papers from the NODe 2002 Web and Database-Related Workshops on Web, Web-Services, and Database Systems
Querying XML using structures and keywords in timber

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Querying structured text in an XML database

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
XRANK: ranked keyword search over XML documents

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Texquery: a full-text search extension to xquery

Proceedings of the 13th international conference on World Wide Web
A TeXQuery-based XML full-text search engine

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
The overlap problem in content-oriented XML retrieval evaluation

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Searching structured documents

Information Processing and Management: an International Journal
XSEarch: a semantic search engine for XML

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

Entity ranking in Wikipedia

Proceedings of the 2008 ACM symposium on Applied computing
Exploiting locality of Wikipedia links in entity ranking

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Entity ranking in Wikipedia: utilising categories, links and topic difficulty prediction

Information Retrieval
RMIT university at INEX 2005: ad hoc track

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
Combining image and structured text retrieval

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
Hybrid XML retrieval revisited

INEX'04 Proceedings of the Third international conference on Initiative for the Evaluation of XML Retrieval
Extending information unit across media streams for improving retrieval effectiveness

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper investigates the impact of three approaches to XML retrieval: using Zettair, a full-text information retrieval system; using eXist, a native XML database; and using a hybrid system that takes full article answers from Zettair and uses eXist to extract elements from those articles. For the content-only topics, we undertake a preliminary analysis of the INEX 2003 relevance assessments in order to identify the types of highly relevant document components. Further analysis identifies two complementary sub-cases of relevance assessments (General and Specific) and two categories of topics (Broad and Narrow). We develop a novel retrieval module that for a content-only topic utilises the information from the resulting answer list of a native XML database and dynamically determines the preferable units of retrieval, which we call Coherent Retrieval Elements. The results of our experiments show that--when each of the three systems is evaluated against different retrieval scenarios (such as different cases of relevance assessments, different topic categories and different choices of evaluation metrics)--the XML retrieval systems exhibit varying behaviour and the best performance can be reached for different values of the retrieval parameters. In the case of INEX 2003 relevance assessments for the content-only topics, our newly developed hybrid XML retrieval system is substantially more effective than either Zettair or eXist, and yields a robust and a very effective XML retrieval.