SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Incremental updates of inverted lists for text document retrieval
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Integrating structured data and text: a relational approach
Journal of the American Society for Information Science
Toward a scalable distributed WWW server on workstation clusters
Journal of Parallel and Distributed Computing
Cluster-based scalable network services
Proceedings of the sixteenth ACM symposium on Operating systems principles
Infoseek's experiences searching the internet
ACM SIGIR Forum
ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
High-level Parallelism in a Database Cluster: A Feasibility Study Using Document Services
Proceedings of the 17th International Conference on Data Engineering
Scalable Distributed Query and Update Service Implementations for XML Document Elements
Eleventh International Workshop on Research Issues in Data Engineering on Document Management for Data Intensive Business and Scientific Applications
Fast Incremental Indexing for Full-Text Information Retrieval
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Data Structures for an Integrated Data Base Management and Information Retrieval System
VLDB '82 Proceedings of the 8th International Conference on Very Large Data Bases
Structured document storage and refined declarative and navigational access mechanisms in HyperStorM
The VLDB Journal — The International Journal on Very Large Data Bases
Improving collection selection with overlap awareness in P2P search engines
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Report on the DB/IR panel at SIGMOD 2005
ACM SIGMOD Record
Inverted files for text search engines
ACM Computing Surveys (CSUR)
Spark: top-k keyword query in relational databases
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
The Hyperdatabase Project --- From the Vision to Realizations
BNCOD '08 Proceedings of the 25th British national conference on Databases: Sharing Data, Information and Knowledge
Information retrieval from digital libraries in SQL
Proceedings of the 10th ACM workshop on Web information and data management
Affinity analysis of coded data sets
Proceedings of the 2009 EDBT/ICDT Workshops
Integrating databases, search engines and web applications: a model-driven approach
ICWE'07 Proceedings of the 7th international conference on Web engineering
On the usage of global document occurrences in peer-to-peer information systems
OTM'05 Proceedings of the 2005 Confederated international conference on On the Move to Meaningful Internet Systems - Volume >Part I
The MINERVA project: towards collaborative search in digital libraries using peer-to-peer technology
DELOS'04 Proceedings of the 6th Thematic conference on Peer-to-Peer, Grid, and Service-Orientation in Digital Library Architectures
Hi-index | 0.00 |
Our current concern is a scalable infrastructure for information retrieval (IR) with up-to-date retrieval results in the presence of frequent, continuous updates. Timely processing of updates is important with novel application domains, e.g., e-commerce. We want to use off-the-self hardware and software as much as possible. These issues are challenging, given the additional requirement that the resulting system must scale well. We have built PowerDB-IR, a system that has the characteristics sought. This paper describes its design, implementation, and evaluation. PowerDB-IR is a coordination layer for a database cluster. The rationale behind a database cluster is to 'scale-out', i.e., to add further cluster nodes, whenever necessary for better performance. We build on IR-to-database mappings and service decomposition to support high-level parallelism. We follow a three-tier architecture with the database cluster as the bottom layer for storage management. The middle tier provides IR-specific processing and update services. PowerDB-IR has the following features: It allows to insert and retrieve documents concurrently, and it ensures freshness with almost no overhead. Alternative physical data organization schemes provide adequate performance for different workloads. Query processing techniques for the different data organizations efficiently integrate the ranked retrieval results from the cluster nodes. We have run extensive experiments with our prototype using commercial database systems and middleware software products. The main result is that PowerDB-IR shows surprisingly ideal scalability and low response times.