A vector space model for automatic indexing
Communications of the ACM
Evaluating contents-link coupled web page clustering for web search results
Proceedings of the eleventh international conference on Information and knowledge management
Analysis of anchor text for web search
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
An Analytical Approach to Concept Extraction in HTML Environments
Journal of Intelligent Information Systems
An Improved Hierarchical K-Means Algorithm for Web Document Clustering
ICCSIT '08 Proceedings of the 2008 International Conference on Computer Science and Information Technology
The Metadata Triumvirate: Social Annotations, Anchor Texts and Search Queries
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Getting the most out of social annotations for web page classification
Proceedings of the 9th ACM symposium on Document engineering
Clustering-based incremental web crawling
ACM Transactions on Information Systems (TOIS)
NLP on spoken documents without ASR
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Global stability of generalized additive fuzzy systems
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Hi-index | 0.00 |
Document representation is an essential step in web page clustering. Web pages are usually written in HTML, offering useful information to select the most important features to represent them. In this paper we investigate the use of nonlinear combinations of criteria by means of a fuzzy system to find those important features. We start our research from a term weighting function called Fuzzy Combination of Criteria (fcc) that relies on term frequency, document title, emphasis and term positions in the text. Next, we analyze its drawbacks and explore the possibility of adding contextual information extracted from inlinks anchor texts, proposing an alternative way of combining criteria based on our experimental results. Finally, we apply a statistical test of significance to compare the original representation with our proposal.