XML Retrieval: DB/IR in theory, web in practice

Authors:
Sihem Amer-Yahia;Ricardo Baeza-Yates;Mariano P. Consens;Mounia Lalmas
Affiliations:
Yahoo! Research, New York;Yahoo! Research Barcelona and Latinamerica;University of Toronto, Canada;Queen Mary University of London, UK
Venue:
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Year:
2007

Citing 0
Cited 2

Benchmarking Fulltext Search Performance of RDF Stores

ESWC 2009 Heraklion Proceedings of the 6th European Semantic Web Conference on The Semantic Web: Research and Applications
Effective XML content and structure retrieval with relevance ranking

Proceedings of the 18th ACM conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

The world of data has been developed from two main points of view: the structured relational data model and the unstructured text model. The two distinct cultures of databases and information retrieval now have a natural meeting place in the Web with its semi-structured XML model. Data in Digital Libraries and in Enterprise Environments also shares many of the semi-structured characteristics of web data. As web-style searching becomes an ubiquitous tool, the need for integrating these two viewpoints becomes even more important. In particular, we consider the application of DB and IR research to querying Web data in the context of online communities. With Web 2.0, the question arises: how can search interfaces remain simple when users are allowed to contribute content (Wikipedia), share it (Flickr), and rate it (YouTube)? When they can decide who their friends are (del.icio.us), what they like to see, and how they want it to look like (MySpace)? While we want to keep the user interface simple (keyword search), we would like to study the applicability of querying structure and content to a context where new forms of data-driven dynamic web content (e.g. user feed-back, tags, contributed multimedia) are provided. This tutorial will provide an overview of the different issues and approaches put forward by the IR and DB communities and survey the DB-IR integration efforts as they focus in the problem of retrieval from XML content. In particular, the context of querying content in online communities is an excellent example of such an application. Both earlier proposals as well as recent ones will be discussed. A variety of application scenarios for XML Retrieval will be covered, including examples of current tools and techniques.