Information retrieval from digital libraries in SQL

Authors:
Carlos Garcia-Alvarado;Carlos Ordonez
Affiliations:
University of Houston, Houston, TX, USA;University of Houston, Houston, TX, USA
Venue:
Proceedings of the 10th ACM workshop on Web information and data management
Year:
2008

Citing 14
Cited 5

Another stemmer

ACM SIGIR Forum
A vector space model for automatic indexing

Communications of the ACM
Query optimization for vector space problems

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
PowerDB-IR: information retrieval on top of a database cluster

Proceedings of the tenth international conference on Information and knowledge management
SQL text parsing for information retrieval

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
A formal study of information retrieval heuristics

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
PowerDB-IR – Scalable Information Retrieval and Storage with a Cluster of Databases

Knowledge and Information Systems
Information Retrieval: Algorithms and Heuristics (The Kluwer International Series on Information Retrieval)

Information Retrieval: Algorithms and Heuristics (The Kluwer International Series on Information Retrieval)
Optimizing recursive queries in SQL

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
The FedLemur project: Federated search in the real world

Journal of the American Society for Information Science and Technology
Inverted files for text search engines

ACM Computing Surveys (CSUR)
Effective keyword search in relational databases

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Load balancing distributed inverted files

Proceedings of the 9th annual ACM international workshop on Web information and data management
Referential integrity quality metrics

Decision Support Systems

DBDOC: querying and browsing databases and interrelated documents

Proceedings of the First International Workshop on Keyword Search on Structured Data
Keyword search across databases and documents

Proceedings of the 2nd International Workshop on Keyword Search on Structured Data
Query recommendation in digital libraries using OLAP

Proceedings of the 2nd International Workshop on Keyword Search on Structured Data
Integrating and querying web databases and documents

Proceedings of the 20th ACM international conference on Information and knowledge management
Extending information unit across media streams for improving retrieval effectiveness

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Information retrieval techniques have been traditionally exploited outside of relational database systems, due to storage overhead, the complexity of programming them inside the database system, and their slow performance in SQL implementations. This project supports the idea that searching and querying digital libraries with information retrieval models in relational database systems can be performed with optimized SQL queries and User-Defined Functions. In our research, we propose several techniques divided into two phases: storing and retrieving. The storing phase includes executing document pre-processing, stop-word removal and term extraction, and the retrieval phase is implemented with three fundamental IR models: the popular Vector Space Model, the Okapi Probabilistic Model, and the Dirichlet Prior Language Model. We conduct experiments using article abstracts from the DBLP bibliography and the ACM Digital Library. We evaluate several query optimizations, compare the on-demand and the static weighting approaches, and we study the performance with conjunctive and disjunctive queries with the three ranking models. Our prototype proved to have linear scalability and a satisfactory performance with medium-sized document collections. Our implementation of the Vector Space Model is competitive with the two other models.