Efficiency and effectiveness of query processing in cluster-based retrieval

Authors:
Fazli Can;Ismail Sengör Altingövde;Engin Demir
Affiliations:
Computer Science and Systems Analysis Department, Miami University, Oxford, OH;Computer Engineering Department, Bilkent University, Bilkent, Ankara 06533, Turkey;Computer Engineering Department, Bilkent University, Bilkent, Ankara 06533, Turkey
Venue:
Information Systems
Year:
2004

Citing 33
Cited 15

Implementation of nonhierarchic cluster analysis methods in chemical information structure search

Journal of Chemical Information & Computer Sciences
The effectiveness and efficiency of agglomerative hierarchic clustering in document retrieval

The effectiveness and efficiency of agglomerative hierarchic clustering in document retrieval
Techniques for the measurement of clustering tendency in document retrieval systems

Journal of Information Science
Algorithms for clustering data

Algorithms for clustering data
Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Recent trends in hierarchic document clustering: a critical review

Information Processing and Management: an International Journal
Comparison of hierarchic agglomerative clustering methods for document retrieval

The Computer Journal
Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
Concepts and effectiveness of the cover-coefficient-based clustering methodology for text databases

ACM Transactions on Database Systems (TODS)
Incremental clustering for dynamic information processing

ACM Transactions on Information Systems (TOIS)
On the efficiency of best-match cluster searches

Information Processing and Management: an International Journal
Document filtering for fast ranking

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Incremental clustering for very large document databases: initial MARIAN experience

Information Sciences—Informatics and Computer Science: An International Journal
Fast evaluation of structured queries for information retrieval

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
The retrieval effectiveness of five clustering algorithms as a function of indexing exhaustivity

Journal of the American Society for Information Science
Text to hypertext: can clustering solve the problem in digital libraries?

Proceedings of the first ACM international conference on Digital libraries
Self-indexing inverted files for fast text retrieval

ACM Transactions on Information Systems (TOIS)
Reexamining the cluster hypothesis: scatter/gather on retrieval results

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Performance standards and evaluations in IR test collections: cluster-based retrieval models

Information Processing and Management: an International Journal
The efficiency of inverted index and cluster searches

Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval
Optimization of inverted vector searches

SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
Almost-constant-time clustering of arbitrary corpus subsets4

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Projections for efficient document clustering

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Analyses of multiple evidence combination

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Principles of database query processing for advanced applications

Principles of database query processing for advanced applications
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
Data clustering: a review

ACM Computing Surveys (CSUR)
Approximating block accesses in database organizations

Communications of the ACM
Deciphering cluster representations

Information Processing and Management: an International Journal
Information Retrieval

Information Retrieval
Modern Information Retrieval

Modern Information Retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Dynamic information and library processing

Dynamic information and library processing

Performance of query processing implementations in ranking-based text retrieval systems using inverted indices

Information Processing and Management: an International Journal
Architecture of a grid-enabled Web search engine

Information Processing and Management: an International Journal
Large-scale cluster-based retrieval experiments on Turkish texts

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Incremental cluster-based retrieval using compressed cluster-skipping inverted files

ACM Transactions on Information Systems (TOIS)
Site-based dynamic pruning for query processing in search engines

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
A RFID case-based logistics resource management system for managing order-picking operations in warehouses

Expert Systems with Applications: An International Journal
Efficient parallel Text Retrieval techniques on Bulk Synchronous Parallel (BSP)/Coarse Grained Multicomputers (CGM)

The Journal of Supercomputing
New event detection and topic tracking in Turkish

Journal of the American Society for Information Science and Technology
Efficient processing of category-restricted queries for web directories

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Exploiting index pruning methods for clustering XML collections

INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
Algorithms for within-cluster searches using inverted files

ISCIS'06 Proceedings of the 21st international conference on Computer and Information Sciences
A fuzzy ranking approach for improving search results in Turkish as an agglutinative language

Expert Systems with Applications: An International Journal
A hybrid case-GA-based decision support model for warehouse operation in fulfilling cross-border orders

Expert Systems with Applications: An International Journal
A new approach to search result clustering and labeling

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Cluster searching strategies for collaborative recommendation systems

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Our research shows that for large databases, without considerable additional storage overhead, cluster-based retrieval (CBR) can compete with the time efficiency and effectiveness of the inverted index-based full search (FS). The proposed CBR method employs a storage structure that blends the cluster membership information into the inverted file posting lists. This approach significantly reduces the cost of similarity calculations for document ranking during query processing and improves efficiency. For example, in terms of in-memory computations, our new approach can reduce query processing time to 39% of FS. The experiments confirm that the approach is scalable and system performance improves with increasing database size. In the experiments, we use the cover coefficient-based clustering methodology (C3M), and the Financial Times database of TREC containing 210 158 documents of size 564 MB defined by 229 748 terms with total of 29 545 234 inverted index elements. This study provides CBR efficiency and effectiveness experiments using the largest corpus in an environment that employs no user interaction or user behavior assumption for clustering.