ACM SIGIR Forum
A vector space model for automatic indexing
Communications of the ACM
Query optimization for vector space problems
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
PowerDB-IR: information retrieval on top of a database cluster
Proceedings of the tenth international conference on Information and knowledge management
SQL text parsing for information retrieval
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
A formal study of information retrieval heuristics
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
PowerDB-IR – Scalable Information Retrieval and Storage with a Cluster of Databases
Knowledge and Information Systems
Information Retrieval: Algorithms and Heuristics (The Kluwer International Series on Information Retrieval)
Optimizing recursive queries in SQL
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
The FedLemur project: Federated search in the real world
Journal of the American Society for Information Science and Technology
Inverted files for text search engines
ACM Computing Surveys (CSUR)
Effective keyword search in relational databases
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Load balancing distributed inverted files
Proceedings of the 9th annual ACM international workshop on Web information and data management
Referential integrity quality metrics
Decision Support Systems
DBDOC: querying and browsing databases and interrelated documents
Proceedings of the First International Workshop on Keyword Search on Structured Data
Keyword search across databases and documents
Proceedings of the 2nd International Workshop on Keyword Search on Structured Data
Query recommendation in digital libraries using OLAP
Proceedings of the 2nd International Workshop on Keyword Search on Structured Data
Integrating and querying web databases and documents
Proceedings of the 20th ACM international conference on Information and knowledge management
Extending information unit across media streams for improving retrieval effectiveness
Data & Knowledge Engineering
Hi-index | 0.00 |
Information retrieval techniques have been traditionally exploited outside of relational database systems, due to storage overhead, the complexity of programming them inside the database system, and their slow performance in SQL implementations. This project supports the idea that searching and querying digital libraries with information retrieval models in relational database systems can be performed with optimized SQL queries and User-Defined Functions. In our research, we propose several techniques divided into two phases: storing and retrieving. The storing phase includes executing document pre-processing, stop-word removal and term extraction, and the retrieval phase is implemented with three fundamental IR models: the popular Vector Space Model, the Okapi Probabilistic Model, and the Dirichlet Prior Language Model. We conduct experiments using article abstracts from the DBLP bibliography and the ACM Digital Library. We evaluate several query optimizations, compare the on-demand and the static weighting approaches, and we study the performance with conjunctive and disjunctive queries with the three ranking models. Our prototype proved to have linear scalability and a satisfactory performance with medium-sized document collections. Our implementation of the Vector Space Model is competitive with the two other models.