Clustering web documents based on knowledge granularity

Authors:
Faliang Huang;Shichao Zhang
Affiliations:
Faculty of Software, Fujian Normal University, Fuzhou, China;Department of Computer Science, Guangxi Normal University, Guilin, China
Venue:
APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
Year:
2006

Citing 10
Cited 2

Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic

Fuzzy Sets and Systems - Special issue: fuzzy sets: where do we stand? Where do we go?
Rough set approach to incomplete information systems

Information Sciences: an International Journal
Clustering through decision tree construction

Proceedings of the ninth international conference on Information and knowledge management
Data mining: concepts and techniques

Data mining: concepts and techniques
Rough Sets: Theoretical Aspects of Reasoning about Data

Rough Sets: Theoretical Aspects of Reasoning about Data
Building a Data Warehouse for Decision Support

Building a Data Warehouse for Decision Support
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Dynamic Adaptive Self-Organising Hybrid Model for Text Clustering

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
A tolerance rough set approach to clustering web search results

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases

The structural clustering and analysis of metric based on granular space

Pattern Recognition
Clustering web documents using hierarchical representation with multi-granularity

World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a new data model for Web document representation based on granulation computing, named as Expanded Vector Space Model (EVSM). Traditional Web document clustering is based on two-level knowledge granularity: document and term. It can lead to that clustering results are of “false relevant”. In our approach, Web documents are represented in many-level knowledge granularity. Knowledge granularity with sufficiently conceptual sentences is beneficial for knowledge engineers to understand valuable relations hidden in data. With granularity calculation data can be more efficiently and effectively disposed of and knowledge engineers can handle the same dataset in different knowledge levels. This provides more reliable soundness for interpreting results of various data analysis methods. We experimentally evaluate the proposed approach and demonstrate that our algorithm is promising and efficient.