Concepts of the cover coefficient-based clustering methodology

Authors:
Fazli Can;Esen A. Ozkarahan
Affiliations:
Dept. of Electrical and Electronic Engineering, Middle East Technical University, Ankara;Dept. of Computer Science, Arizona State University, Tempe, Arizona
Venue:
SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
1985

Citing 10
Cited 4

An integrated fact/document information system for office automation

Information Technology Research Development Applications - Lecture notes in computer science 178
Similarity and stability analysis of the two Partitioning type clustering algorithms

Journal of the American Society for Information Science
Generation and search of clustered files

ACM Transactions on Database Systems (TODS)
A clustering scheme

SIGIR '83 Proceedings of the 6th annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval

Information Retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Techniques for measuring the stability of clustering: a comparative study

SIGIR '82 Proceedings of the 5th annual ACM conference on Research and development in information retrieval
Approaches for measuring the stability of clustering methods

ACM SIGIR Forum
Dynamic information and library processing

Dynamic information and library processing
The SMART Retrieval System—Experiments in Automatic Document Processing

The SMART Retrieval System—Experiments in Automatic Document Processing

A dynamic cluster maintenance system for information retrieval

SIGIR '87 Proceedings of the 10th annual international ACM SIGIR conference on Research and development in information retrieval
Concepts and effectiveness of the cover-coefficient-based clustering methodology for text databases

ACM Transactions on Database Systems (TODS)
An automatic and tunable document indexing system

Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval
Measuring software coupling

SEPADS'07 Proceedings of the 6th WSEAS International Conference on Software Engineering, Parallel and Distributed Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Document clustering has several unresolved problems. Among them are high time and space complexity, difficulty of determining similarity thresholds, order dependence, nonuniform document distribution in clusters, and arbitrariness in determination of various cluster intiators. To overcome these problems to some degree, the cover coefficient based clustering methodology has been introduced. The concepts used in this methodology have created certain new concepts, relationships, and measures such as the effect of indexing on clustering, an optimal vocabulary generation for indexing, and a new matching function. These new concepts are discussed. The result of performance experiments that show the effectiveness of the clustering methodology and the matching function are also included. In these experiments, it has been also observed that the majority of the documents obtained in a search are concentrated in a few clusters containing a low percentage of documents of the database.