Keyword search across databases and documents

  • Authors:
  • Carlos Garcia-Alvarado;Carlos Ordonez

  • Affiliations:
  • University of Houston, Houston, TX;University of Houston, Houston, TX

  • Venue:
  • Proceedings of the 2nd International Workshop on Keyword Search on Structured Data
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Given the continuous growth of databases and the abundance of diverse files in modern IT environments, there is a pressing need to integrate keyword search on heterogeneous information sources. A particular case in which such integration is needed occurs when a collection of documents (e.g. word processing documents, spreadsheets, text files and so on) is derived directly from a central database, and both repositories are independently updated. Finding hidden relationships between documents and databases is difficult, given the loose connection between them. This problem is especially complicated when database integration techniques must be extended to handle semi-structured data (i.e. documents). Our research focuses on exploiting a relational database system for integrating and exploring complex interrelationships between a database and a collection of potentially related documents. We focus on the discovery and ranking of keyword links (relationships) at different granularity levels between a database schema and a collection of documents. We adapt, extend, and combine information retrieval techniques into the DBMS. As such, we provide algorithms for efficient exploration of discovered relationships among a collection of documents and a DBMS. We experimentally show that our system can discover, query and rank complex relationships discovered between a database and surrounding documents.