Fuzzy Sets and Systems - Special issue: fuzzy sets: where do we stand? Where do we go?
Rough set approach to incomplete information systems
Information Sciences: an International Journal
Clustering through decision tree construction
Proceedings of the ninth international conference on Information and knowledge management
Data mining: concepts and techniques
Data mining: concepts and techniques
Rough Sets: Theoretical Aspects of Reasoning about Data
Rough Sets: Theoretical Aspects of Reasoning about Data
Building a Data Warehouse for Decision Support
Building a Data Warehouse for Decision Support
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Dynamic Adaptive Self-Organising Hybrid Model for Text Clustering
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
A tolerance rough set approach to clustering web search results
PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Hi-index | 0.00 |
We propose a new data model for Web document representation based on granulation computing, named as Expanded Vector Space Model (EVSM). Traditional Web document clustering is based on two-level knowledge granularity: document and term. It can lead to that clustering results are of “false relevant”. In our approach, Web documents are represented in many-level knowledge granularity. Knowledge granularity with sufficiently conceptual sentences is beneficial for knowledge engineers to understand valuable relations hidden in data. With granularity calculation data can be more efficiently and effectively disposed of and knowledge engineers can handle the same dataset in different knowledge levels. This provides more reliable soundness for interpreting results of various data analysis methods. We experimentally evaluate the proposed approach and demonstrate that our algorithm is promising and efficient.