Frequent term-based text clustering

Authors:
Florian Beil;Martin Ester;Xiaowei Xu
Affiliations:
Ludwig-Maximilians-Universitaet, Muenchen, Munich, Germany;Simon Fraser University, Burnaby, BC, Canada;Siemens AG, Munich, Germany
Venue:
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2002

Citing 8
Cited 76

Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
WebACE: a Web agent for document categorization and exploration

AGENTS '98 Proceedings of the second international conference on Autonomous agents
Web document clustering: a feasibility demonstration

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Fast and effective text mining using linear-time document clustering

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Algorithms for association rule mining — a general survey and comparison

ACM SIGKDD Explorations Newsletter
Classifying text documents by associating terms with text categories

ADC '02 Proceedings of the 13th Australasian database conference - Volume 5
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Data mining for hypertext: a tutorial survey

ACM SIGKDD Explorations Newsletter

Segmenting Customer Transactions Using a Pattern-Based Clustering Approach

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Extracting unstructured data from template generated web documents

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
COFI approach for mining frequent itemsets revisited

Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Efficient Phrase-Based Document Indexing for Web Document Clustering

IEEE Transactions on Knowledge and Data Engineering
Iterative Projected Clustering by Subspace Mining

IEEE Transactions on Knowledge and Data Engineering
A divide-and-merge methodology for clustering

Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
DM-AMS: employing data mining techniques for alert management

dg.o '05 Proceedings of the 2005 national conference on Digital government research
A sampling-based framework for parallel data mining

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Mining condensed frequent-pattern bases

Knowledge and Information Systems
Adaptive topological tree structure for document organisation and visualisation

Neural Networks - 2004 Special issue: New developments in self-organizing systems
A partitioning based algorithm to fuzzy co-cluster documents and words

Pattern Recognition Letters
Implementing leap traversals of the itemset lattice

Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations
A comprehensive comparison study of document clustering for a biomedical digital library MEDLINE

Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Integration of semantic-based bipartite graph representation and mutual refinement strategy for biomedical literature clustering

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Pragmatic text mining: minimizing human effort to quantify many issues in call logs

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
A divide-and-merge methodology for clustering

ACM Transactions on Database Systems (TODS)
Discover the semantic topology in high-dimensional data

Expert Systems with Applications: An International Journal
Discovery of maximum length frequent itemsets

Information Sciences: an International Journal
Text document clustering based on frequent word meaning sequences

Data & Knowledge Engineering
Biomedical ontology improves biomedical literature clustering performance: a comparison study

International Journal of Bioinformatics Research and Applications
Query-sets: using implicit feedback and query patterns to organize web documents

Proceedings of the 17th international conference on World Wide Web
A Novel Web Page Analysis Method for Efficient Reasoning of User Preference

APCHI '08 Proceedings of the 8th Asia-Pacific conference on Computer-Human Interaction
A New Document Clustering Algorithm for Topic Discovering and Labeling

CIARP '08 Proceedings of the 13th Iberoamerican congress on Pattern Recognition: Progress in Pattern Recognition, Image Analysis and Applications
Hierarchical Star Clustering Algorithm for Dynamic Document Collections

CIARP '08 Proceedings of the 13th Iberoamerican congress on Pattern Recognition: Progress in Pattern Recognition, Image Analysis and Applications
Clustering high dimensional data: A graph-based relaxed optimization approach

Information Sciences: an International Journal
Context-Based Text Mining for Insights in Long Documents

PAKM '08 Proceedings of the 7th International Conference on Practical Aspects of Knowledge Management
An Integration of Fuzzy Association Rules and WordNet for Document Clustering

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Parameter Tuning for Disjoint Clusters Based on Concept Lattices with Application to Location Learning

RSFDGrC '07 Proceedings of the 11th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
A Semi-supervised Topic-Driven Approach for Clustering Textual Answers to Survey Questions

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Interpretable and reconfigurable clustering of document datasets by deriving word-based rules

Proceedings of the 18th ACM conference on Information and knowledge management
A simplicial complex, a hypergraph, structure in the latent semantic space of document clustering

International Journal of Approximate Reasoning
Text clustering approach based on maximal frequent term sets

SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
Mining fuzzy frequent itemsets for hierarchical document clustering

Information Processing and Management: an International Journal
Dynamic hierarchical algorithms for document clustering

Pattern Recognition Letters
Two-party privacy-preserving agglomerative document clustering

ISPEC'07 Proceedings of the 3rd international conference on Information security practice and experience
Clustering zebrafish genes based on frequent-itemsets and frequency levels

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Frequent variable sets based clustering for artificial neural networks particle classification

APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
Text onto miner: a semi automated ontology building system

ISMIS'08 Proceedings of the 17th international conference on Foundations of intelligent systems
Semantics-guided clustering of heterogeneous XML schemas

Journal on data semantics IX
Text clustering using frequent itemsets

Knowledge-Based Systems
Hierarchical document clustering using local patterns

Data Mining and Knowledge Discovery
Evolutionary clustering using frequent itemsets

Proceedings of the First International Workshop on Novel Data Stream Pattern Mining Techniques
A topological embedding of the lexicon for semantic distance computation

Natural Language Engineering
Validation of overlapping clustering: A random clustering perspective

Information Sciences: an International Journal
Editorial: An integration of WordNet and fuzzy association rule mining for multi-label document clustering

Data & Knowledge Engineering
Frequent itemset based hierarchical document clustering using Wikipedia as external knowledge

KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part II
A smarter process for sensing the information space

IBM Journal of Research and Development
A comparison of unsupervised learning algorithms for gesture clustering

Proceedings of the 6th international conference on Human-robot interaction
Hierarchical comments-based clustering

Proceedings of the 2011 ACM Symposium on Applied Computing
SciSumm: a multi-document summarization system for scientific articles

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Systems Demonstrations
Towards multi-document summarization of scientific articles: making interesting comparisons with SciSumm

WASDGML '11 Proceedings of the Workshop on Automatic Summarization for Different Genres, Media, and Languages
Clustering for semi-supervised spam filtering

Proceedings of the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference
An efficient algorithm for topic ranking and modeling topic evolution

DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part I
Improving document clustering using Okapi BM25 feature weighting

Information Retrieval
Fast mining erasable itemsets using NC_sets

Expert Systems with Applications: An International Journal
Clustering large collection of biomedical literature based on ontology-enriched bipartite graph representation and mutual refinement strategy

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
An efficient user-oriented clustering of web search results

ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part III
An approach for clustering semantically heterogeneous XML schemas

OTM'05 Proceedings of the 2005 Confederated international conference on On the Move to Meaningful Internet Systems - Volume >Part I
Parallel mining of top-k frequent itemsets in very large text database

WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management
Incremental clustering of newsgroup articles

IEA/AIE'06 Proceedings of the 19th international conference on Advances in Applied Artificial Intelligence: industrial, Engineering and Other Applications of Applied Intelligent Systems
Web image clustering with reduced keywords and weighted bipartite spectral graph partitioning

PCM'06 Proceedings of the 7th Pacific Rim conference on Advances in Multimedia Information Processing
Succinct and informative cluster descriptions for document repositories

WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
Relevance of counting in data mining tasks

ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications
Term graph model for text classification

ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications
Improving retrievability with improved cluster-based pseudo-relevance feedback selection

Expert Systems with Applications: An International Journal
Short documents clustering in very large text databases

WISE'06 Proceedings of the 7th international conference on Web Information Systems
Improving suffix tree clustering with new ranking and similarity measures

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part II
Mining same-taste users with common preference patterns for ubiquitous exhibition navigation

ACIIDS'12 Proceedings of the 4th Asian conference on Intelligent Information and Database Systems - Volume Part III
Selecting labels for news document clusters

NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems
Effective measures for inter-document similarity

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Accelerating frequent item counting with FPGA

Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays
Clustering Software Components for Component Reuse and Program Restructuring

Proceedings of the Second International Conference on Innovative Computing and Cloud Computing
Summarization of scientific documents by detecting common facts in citations

Future Generation Computer Systems
CALA: An unsupervised URL-based web page classification system

Knowledge-Based Systems
Clustering web documents using hierarchical representation with multi-granularity

World Wide Web
Enhanced cross-domain document clustering with a semantically enhanced text stemmer SETS

International Journal of Knowledge-based and Intelligent Engineering Systems - Selected papers of KES2012-Part 2 of 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text clustering methods can be used to structure large sets of text or hypertext documents. The well-known methods of text clustering, however, do not really address the special problems of text clustering: very high dimensionality of the data, very large size of the databases and understandability of the cluster description. In this paper, we introduce a novel approach which uses frequent item (term) sets for text clustering. Such frequent sets can be efficiently discovered using algorithms for association rule mining. To cluster based on frequent term sets, we measure the mutual overlap of frequent sets with respect to the sets of supporting documents. We present two algorithms for frequent term-based text clustering, FTC which creates flat clusterings and HFTC for hierarchical clustering. An experimental evaluation on classical text documents as well as on web documents demonstrates that the proposed algorithms obtain clusterings of comparable quality significantly more efficiently than state-of-the- art text clustering algorithms. Furthermore, our methods provide an understandable description of the discovered clusters by their frequent term sets.