Fuzzy Sets and Systems - Special issue: fuzzy sets: where do we stand? Where do we go?
A vector space model for automatic indexing
Communications of the ACM
Unsupervised learning by probabilistic latent semantic analysis
Machine Learning
Rough Sets: Theoretical Aspects of Reasoning about Data
Rough Sets: Theoretical Aspects of Reasoning about Data
Frequent term-based text clustering
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
The Journal of Machine Learning Research
Ontologies Improve Text Document Clustering
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Semantic similarity methods in wordNet and their application to information retrieval on the web
Proceedings of the 7th annual ACM international workshop on Web information and data management
Text Representation: From Vector to Tensor
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Text document clustering based on frequent word meaning sequences
Data & Knowledge Engineering
GDClust: A Graph-Based Document Clustering Technique
ICDMW '07 Proceedings of the Seventh IEEE International Conference on Data Mining Workshops
A Novelty-based Clustering Method for On-line Documents
World Wide Web
Bioinformatics
TinyLex: static n-gram index pruning with perfect recall
Proceedings of the 17th ACM conference on Information and knowledge management
Evaluation of Text Clustering Algorithms with N-Gram-Based Document Fingerprints
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
WordNet-based text document clustering
ROMAND '04 Proceedings of the 3rd Workshop on RObust Methods in Analysis of Natural Language Data
Document Clustering with Cluster Refinement and Non-negative Matrix Factorization
ICONIP '09 Proceedings of the 16th International Conference on Neural Information Processing: Part II
TagClus: a random walk-based method for tag clustering
Knowledge and Information Systems - Special Issue: Best Papers of the Fifth International Conference on Advanced Data Mining and Applications (ADMA 2009)
A neural network for text representation
ICANN'05 Proceedings of the 15th international conference on Artificial neural networks: formal models and their applications - Volume Part II
Statistical semantics for enhancing document clustering
Knowledge and Information Systems - Special Issue on "Context-Aware Data Mining (CADM)"
On ontology-driven document clustering using core semantic features
Knowledge and Information Systems - Special Issue on "Context-Aware Data Mining (CADM)"
An integration of fuzzy association rules and WordNet for document clustering
Knowledge and Information Systems - Special Issue on Data Warehousing and Knowledge Discovery from Sensors and Streams
Clustering web documents based on knowledge granularity
APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
D2S: Document-to-sentence framework for novelty detection
Knowledge and Information Systems
Hi-index | 0.00 |
Web document cluster analysis plays an important role in information retrieval by organizing large amounts of documents into a small number of meaningful clusters. Traditional web document clustering is based on the Vector Space Model (VSM), which takes into account only two-level (document and term) knowledge granularity but ignores the bridging paragraph granularity. However, this two-level granularity may lead to unsatisfactory clustering results with "false correlation". In order to deal with the problem, a Hierarchical Representation Model with Multi-granularity (HRMM), which consists of five-layer representation of data and a two-phase clustering process is proposed based on granular computing and article structure theory. To deal with the zero-valued similarity problem resulted from the sparse term-paragraph matrix, an ontology based strategy and a tolerance-rough-set based strategy are introduced into HRMM. By using granular computing, structural knowledge hidden in documents can be more efficiently and effectively captured in HRMM and thus web document clusters with higher quality can be generated. Extensive experiments show that HRMM, HRMM with tolerance-rough-set strategy, and HRMM with ontology all outperform VSM and a representative non VSM-based algorithm, WFP, significantly in terms of the F-Score.