Incremental hierarchical clustering of text documents

Authors:
Nachiketa Sahoo;Jamie Callan;Ramayya Krishnan;George Duncan;Rema Padman
Affiliations:
Carnegie Mellon University;Carnegie Mellon University;Carnegie Mellon University;Carnegie Mellon University;Carnegie Mellon University
Venue:
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Year:
2006

Citing 13
Cited 21

Models of incremental concept formation

Artificial Intelligence
Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
On-line new event detection and tracking

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Data clustering: a review

ACM Computing Surveys (CSUR)
Evaluation of hierarchical clustering algorithms for document datasets

Proceedings of the eleventh international conference on Information and knowledge management
Knowledge Acquisition Via Incremental Conceptual Clustering

Machine Learning
Distribution of content words and phrases in text and language modelling

Natural Language Engineering
Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering

Machine Learning
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
Cluster-based retrieval using language models

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Refining web search engine results using incremental clustering

International Journal of Intelligent Systems - Intelligent Technologies
An architecture for efficient document clustering and retrieval on a dynamic collection of newspaper texts

IRSG'98 Proceedings of the 20th Annual BCS-IRSG conference on Information Retrieval Research

Short communication: Variable space hidden Markov model for topic detection and analysis

Knowledge-Based Systems
Utilizing phrase-similarity measures for detecting and clustering informative RSS news articles

Integrated Computer-Aided Engineering
Generating Fuzzy Equivalence Classes on RSS News Articles for Retrieving Correlated Information

ICCSA '08 Proceedings of the international conference on Computational Science and Its Applications, Part II
Aggregated cross-media news visualization and personalization

MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
Clustering of document collection - A weighting approach

Expert Systems with Applications: An International Journal
A survey of Web clustering engines

ACM Computing Surveys (CSUR)
Dynamicity vs. effectiveness: studying online clustering for scatter/gather

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Efficient approach for incremental Vietnamese document clustering

Proceedings of the eleventh international workshop on Web information and data management
Multi-grain hierarchical topic extraction algorithm for text mining

Expert Systems with Applications: An International Journal
Clustering objects from multiple collections

KI'09 Proceedings of the 32nd annual German conference on Advances in artificial intelligence
Document update summarization using incremental hierarchical clustering

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Generating an event arrangement for understanding news articles on the web

IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part II
Research of fast SOM clustering for text information

Expert Systems with Applications: An International Journal
Hierarchical comments-based clustering

Proceedings of the 2011 ACM Symposium on Applied Computing
Document clustering with universum

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Document hierarchies from text and links

Proceedings of the 21st international conference on World Wide Web
Characterization and exploitation of community structure in cover song networks

Pattern Recognition Letters
Efficient jaccard-based diversity analysis of large document collections

Proceedings of the 21st ACM international conference on Information and knowledge management
A stochastic hyperheuristic for unsupervised matching of partial information

Advances in Artificial Intelligence
Aggregated search: A new information retrieval paradigm

ACM Computing Surveys (CSUR)
Hierarchical co-clustering: off-line and incremental approaches

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Incremental hierarchical text document clustering algorithms are important in organizing documents generated from streaming on-line sources, such as, Newswire and Blogs. However, this is a relatively unexplored area in the text document clustering literature. Popular incremental hierarchical clustering algorithms, namely Cobweb and Classit, have not been widely used with text document data. We discuss why, in the current form, these algorithms are not suitable for text clustering and propose an alternative formulation that includes changes to the underlying distributional assumption of the algorithm in order to conform with the data. Both the original Classit algorithm and our proposed algorithm are evaluated using Reuters newswire articles and Ohsumed dataset.