Optimizing convenient online access to bibliographic databases
Information Services and Use
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Discovery of Multiple-Level Association Rules from Large Databases
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Applying the Subdue Substructure Discovery System to the Chemical Toxicity Domain
Proceedings of the Twelfth International Florida Artificial Intelligence Research Society Conference
gSpan: Graph-Based Substructure Pattern Mining
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Diagonally Subgraphs Pattern Mining
Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
A study of topic similarity measures
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
An Efficient Algorithm for Discovering Frequent Subgraphs
IEEE Transactions on Knowledge and Data Engineering
Introduction to Data Mining, (First Edition)
Introduction to Data Mining, (First Edition)
Subdue: compression-based frequent pattern discovery in graph data
Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations
YALE: rapid prototyping for complex data mining tasks
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
GDClust: A Graph-Based Document Clustering Technique
ICDMW '07 Proceedings of the Seventh IEEE International Conference on Data Mining Workshops
Knowledge map creation and maintenance for virtual communities of practice
Information Processing and Management: an International Journal
Abstracting for Dimensionality Reduction in Text Classification
International Journal of Intelligent Systems
Hi-index | 0.00 |
In this paper we introduce and analyze two improvements to GDClust [1], a system for document clustering based on the co-occurrence of frequent subgraphs. GDClust (Graph-Based Document Clustering) works with frequent senses derived from the constraints provided by the natural language rather than working with the co-occurrences of frequent keywords commonly used in the vector space model (VSM) of document clustering. Text documents are transformed to hierarchical document-graphs, and an efficient graph-mining technique is used to find frequent subgraphs. Discovered frequent subgraphs are then utilized to generate accurate sense-based document clusters. In this paper, we introduce two novel mechanisms called the Subgraph-Extension Generator (SEG) and the Maximum Subgraph-Extension Generator (MaxSEG) which directly utilize constraints from the natural language to reduce the number of candidates and the overhead imposed by our first implementation of GDClust.