Hierarchic document classification using Ward's clustering method

Authors:
A. El-Hamdouchi;P. Willett
Affiliations:
Sheffield University, Western Bank, Sheffield, S10 2TN, UK;Sheffield University, Western Bank, Sheffield, S10 2TN, UK
Venue:
Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
1986

Citing 8
Cited 20

Test of methods for evaluating bibliographic databases: an analysis of the National Library of Medicine's handling of literatures in the medical behavioral sciences

Journal of the American Society for Information Science
An Algorithm for Finding Best Matches in Logarithmic Expected Time

ACM Transactions on Mathematical Software (TOMS)
Optimal Expected-Time Algorithms for Closest Point Problems

ACM Transactions on Mathematical Software (TOMS)
The nearest neighbour problem in information retrieval: an algorithm using upperbounds

SIGIR '81 Proceedings of the 4th annual international ACM SIGIR conference on Information storage and retrieval: theoretical issues in information retrieval
Information Retrieval

Information Retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
A probabilistic algorithm for nearest neighbour searching

SIGIR '80 Proceedings of the 3rd annual ACM conference on Research and development in information retrieval
Parallel Computations in Information Retrieval

Parallel Computations in Information Retrieval

Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Cluster-based text categorization: a comparison of category search strategies

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Fast and effective text mining using linear-time document clustering

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Document clustering based on cluster validation

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Interpretable Hierarchical Clustering by Constructing an Unsupervised Decision Tree

IEEE Transactions on Knowledge and Data Engineering
Combining preference- and content-based approaches for improving document clustering effectiveness

Information Processing and Management: an International Journal
Using cluster validation criterion to identify optimal feature subset and cluster number for document clustering

Information Processing and Management: an International Journal
Accommodating Individual Preferences in the Categorization of Documents: A Personalized Clustering Approach

Journal of Management Information Systems
A collaborative filtering-based approach to personalized document clustering

Decision Support Systems
A Latent Semantic Indexing-based approach to multilingual document clustering

Decision Support Systems
Managing Word Mismatch Problems in Information Retrieval: A Topic-Based Query Expansion Approach

Journal of Management Information Systems
Monotone Increasing Binary Similarity and Its Application to Automatic Document-Acquisition of a Category

IEICE - Transactions on Information and Systems
Preserving User Preferences in Automated Document-Category Management: An Evolution-Based Approach

Journal of Management Information Systems
Re-ranking search results using language models of query-specific clusters

Information Retrieval
Parametric and nonparametric evolutionary computing with a content-based feature selection approach for parallel categorization

Expert Systems with Applications: An International Journal
Combining preference- and content-based approaches for improving document clustering effectiveness

Information Processing and Management: an International Journal
A survey on statistical disclosure control and micro-aggregation techniques for secure statistical databases

Software—Practice & Experience - Focus on Selected PhD Literature Reviews in the Practical Aspects of Software Technology
Cross-lingual text categorization: Conquering language boundaries in globalized environments

Information Processing and Management: an International Journal
SAM method as an approach to select candidates for human prostate cancer markers

BSB'05 Proceedings of the 2005 Brazilian conference on Advances in Bioinformatics and Computational Biology
Probabilistic co-relevance for query-sensitive similarity measurement in information retrieval

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we discuss the application of a recent hierarchic clustering algorithm to the automatic classification of files of documents. Whereas most hierarchic clustering algorithms involve the generation and updating of an inter-object dissimilarity matrix, this new algorithm is based upon a series of nearest neighbor searches. Such an approach is appropriate to several clustering methods, including Ward's method which has been shown to perform well in experimental studies of hierarchic document clustering. A description is given of heuristics which can increase the efficiency of the new algorithm when it is used to cluster three document collections by Ward's method.