Concepts and effectiveness of the cover-coefficient-based clustering methodology for text databases

Authors:
Fazli Can;Esen A. Ozkarahan
Affiliations:
Miami Univ., Oxford, OH;Pennsylvania State Univ., Erie
Venue:
ACM Transactions on Database Systems (TODS)
Year:
1990

Citing 21
Cited 22

Database machines and database management

Database machines and database management
The effectiveness and efficiency of agglomerative hierarchic clustering in document retrieval

The effectiveness and efficiency of agglomerative hierarchic clustering in document retrieval
Implementing agglomerative hierarchic clustering algorithms for use in document retrieval

Information Processing and Management: an International Journal
Techniques for the measurement of clustering tendency in document retrieval systems

Journal of Information Science
Non-hierarchical document clustering using the ICL distribution array processor

SIGIR '87 Proceedings of the 10th annual international ACM SIGIR conference on Research and development in information retrieval
Algorithms for clustering data

Algorithms for clustering data
Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Recent trends in hierarchic document clustering: a critical review

Information Processing and Management: an International Journal
Comparison of hierarchic agglomerative clustering methods for document retrieval

The Computer Journal
Dynamic cluster maintenance

Information Processing and Management: an International Journal
Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
The efficiency of inverted index and cluster searches

Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval
An automatic and tunable document indexing system

Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval
The cluster hypothesis revisited

SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
Concepts of the cover coefficient-based clustering methodology

SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
Generation and search of clustered files

ACM Transactions on Database Systems (TODS)
Approximating block accesses in database organizations

Communications of the ACM
A clustering scheme

SIGIR '83 Proceedings of the 6th annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval

Information Retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Dynamic information and library processing

Dynamic information and library processing

Incremental clustering for dynamic information processing

ACM Transactions on Information Systems (TOIS)
Analysis of multiterm queries in a dynamic signature file organization

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
HypIR: a hypertext-based approach to information retrieval

SAC '93 Proceedings of the 1993 ACM/SIGAPP symposium on Applied computing: states of the art and practice
Node re-usability in structured hypertext systems

CSC '93 Proceedings of the 1993 ACM conference on Computer science
Multi-media document representation and retrieval

CSC '91 Proceedings of the 19th annual conference on Computer Science
Efficiency and effectiveness of query processing in cluster-based retrieval

Information Systems
Incremental cluster-based retrieval using compressed cluster-skipping inverted files

ACM Transactions on Information Systems (TOIS)
A recommender system for requirements elicitation in large-scale software projects

Proceedings of the 2009 ACM symposium on Applied Computing
Cover Coefficient-Based Multi-document Summarization

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Automated support for managing feature requests in open forums

Communications of the ACM - A View of Parallel Computing
New event detection and topic tracking in Turkish

Journal of the American Society for Information Science and Technology
Utilizing recommender systems to support software requirements elicitation

Proceedings of the 2nd International Workshop on Recommendation Systems for Software Engineering
Exploiting index pruning methods for clustering XML collections

INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
On-demand feature recommendations derived from mining public product descriptions

Proceedings of the 33rd International Conference on Software Engineering
An expansion and reranking approach for annotation-based image retrieval from Web

Expert Systems with Applications: An International Journal
Towards systematic analysis of continuous user input

Proceedings of the 4th international workshop on Social software engineering
Algorithms for within-cluster searches using inverted files

ISCIS'06 Proceedings of the 21st international conference on Computer and Information Sciences
A fuzzy ranking approach for improving search results in Turkish as an agglutinative language

Expert Systems with Applications: An International Journal
A new approach to search result clustering and labeling

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Clustering information retrieval search outputs

IRSG'99 Proceedings of the 21st Annual BCS-IRSG conference on Information Retrieval Research
Cluster searching strategies for collaborative recommendation systems

Information Processing and Management: an International Journal
Exploratory analysis of highly heterogeneous document collections

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.01

Visualization

Abstract

A new algorithm for document clustering is introduced. The base concept of the algorithm, the cover coefficient (CC) concept, provides a means of estimating the number of clusters within a document database and related indexing and clustering analytically. The CC concept is used also to identify the cluster seeds and to form clusters with these seeds. It is shown that the complexity of the clustering process is very low. The retrieval experiments show that the information-retrieval effectiveness of the algorithm is compatible with a very demanding complete linkage clustering method that is known to have good retrieval performance. The experiments also show that the algorithm is 15.1 to 63.5 (with an average of 47.5) percent better than four other clustering algorithms in cluster-based information retrieval. The experiments have validated the indexing-clustering relationships and the complexity of the algorithm and have shown improvements in retrieval effectiveness. In the experiments two document databases are used: TODS214 and INSPEC. The latter is a common database with 12,684 documents.