Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Share Based Measures for Itemsets
PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
Selecting the right interestingness measure for association patterns
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Frequent term-based text clustering
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering all most specific sentences
ACM Transactions on Database Systems (TODS)
Subspace clustering for high dimensional data: a review
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
TopCat: Data Mining for Topic Identification in a Text Corpus
IEEE Transactions on Knowledge and Data Engineering
SUMMARY: Efficiently Summarizing Transactions for Clustering
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Scalable Construction of Topic Directory with Nonparametric Closed Termset Mining
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Hierarchical Clustering Algorithms for Document Datasets
Data Mining and Knowledge Discovery
Text document clustering based on frequent word sequences
Proceedings of the 14th ACM international conference on Information and knowledge management
Interestingness measures for data mining: A survey
ACM Computing Surveys (CSUR)
High Quality, Efficient Hierarchical Document Clustering Using Closed Interesting Itemsets
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Optimizing Frequency Queries for Data Mining Applications
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Mining top-k frequent closed itemsets is not in APX
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Guest Editorial: Global modeling using local patterns
Data Mining and Knowledge Discovery
Exploring the corporate ecosystem with a semi-supervised entity graph
Proceedings of the 20th ACM international conference on Information and knowledge management
Measuring the coverage and redundancy of information search services on e-commerce platforms
Electronic Commerce Research and Applications
Data Field for Hierarchical Clustering
International Journal of Data Warehousing and Mining
A sample-based hierarchical adaptive K-means clustering method for large-scale video retrieval
Knowledge-Based Systems
Mining frequent patterns and association rules using similarities
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
The global pattern mining step in existing pattern-based hierarchical clustering algorithms may result in an unpredictable number of patterns. In this paper, we propose IDHC, a pattern-based hierarchical clustering algorithm that builds a cluster hierarchy without mining for globally significant patterns. IDHC first discovers locally promising patterns by allowing each instance to "vote" for its representative size-2 patterns in a way that ensures an effective balance between local pattern frequency and pattern significance in the dataset. The cluster hierarchy (i.e., the global model) is then directly constructed using these locally promising patterns as features. Each pattern forms an initial (possibly overlapping) cluster, and the rest of the cluster hierarchy is obtained by following a unique iterative cluster refinement process. By effectively utilizing instance-to-cluster relationships, this process directly identifies clusters for each level in the hierarchy, and efficiently prunes duplicate clusters. Furthermore, IDHC produces cluster labels that are more descriptive (patterns are not artificially restricted), and adapts a soft clustering scheme that allows instances to exist in suitable nodes at various levels in the cluster hierarchy. We present results of experiments performed on 16 standard text datasets, and show that IDHC outperforms state-of-the-art hierarchical clustering algorithms in terms of average entropy and FScore measures.