XML Retrieval: DB/IR in theory, web in practice

  • Authors:
  • Sihem Amer-Yahia;Ricardo Baeza-Yates;Mariano P. Consens;Mounia Lalmas

  • Affiliations:
  • Yahoo! Research, New York;Yahoo! Research Barcelona and Latinamerica;University of Toronto, Canada;Queen Mary University of London, UK

  • Venue:
  • VLDB '07 Proceedings of the 33rd international conference on Very large data bases
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The world of data has been developed from two main points of view: the structured relational data model and the unstructured text model. The two distinct cultures of databases and information retrieval now have a natural meeting place in the Web with its semi-structured XML model. Data in Digital Libraries and in Enterprise Environments also shares many of the semi-structured characteristics of web data. As web-style searching becomes an ubiquitous tool, the need for integrating these two viewpoints becomes even more important. In particular, we consider the application of DB and IR research to querying Web data in the context of online communities. With Web 2.0, the question arises: how can search interfaces remain simple when users are allowed to contribute content (Wikipedia), share it (Flickr), and rate it (YouTube)? When they can decide who their friends are (del.icio.us), what they like to see, and how they want it to look like (MySpace)? While we want to keep the user interface simple (keyword search), we would like to study the applicability of querying structure and content to a context where new forms of data-driven dynamic web content (e.g. user feed-back, tags, contributed multimedia) are provided. This tutorial will provide an overview of the different issues and approaches put forward by the IR and DB communities and survey the DB-IR integration efforts as they focus in the problem of retrieval from XML content. In particular, the context of querying content in online communities is an excellent example of such an application. Both earlier proposals as well as recent ones will be discussed. A variety of application scenarios for XML Retrieval will be covered, including examples of current tools and techniques.