A tolerance rough set approach to clustering web search results

Authors:
Chi Lang Ngo;Hung Son Nguyen
Affiliations:
Warsaw University, Banacha 2, 02-097 Warsaw, Poland;Warsaw University, Banacha 2, 02-097 Warsaw, Poland
Venue:
PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Year:
2004

Citing 0
Cited 13

Generating Concept Ontologies through Text Mining

WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
A survey of Web clustering engines

ACM Computing Surveys (CSUR)
Web Document Classification Based on Rough Set

RSFDGrC '07 Proceedings of the 11th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
Rough Set Based Personalized Recommendation in Mobile Commerce

AMT '09 Proceedings of the 5th International Conference on Active Media Technology
A rough set approach to classifying web page without negative examples

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Review:

The Knowledge Engineering Review
Rough set and ensemble learning based semi-supervised algorithm for text classification

Expert Systems with Applications: An International Journal
Diverse reduct subspaces based co-training for partially labeled data

International Journal of Approximate Reasoning
Interactive chinese search results clustering for personalization

WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management
Clustering web documents based on knowledge granularity

APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
Association rule centric clustering of web search results

MIWAI'11 Proceedings of the 5th international conference on Multi-Disciplinary Trends in Artificial Intelligence
Dynamic rule-based similarity model for DNA microarray data

Transactions on Rough Sets XV
Unsupervised Similarity Learning from Textual Data

Fundamenta Informaticae - Concurrency Specification and Programming (CS&P)

Quantified Score

Hi-index	0.01

Visualization

Abstract

Two most popular approaches to facilitate searching for information on the web are represented by web search engine and web directories. Although the performance of search engines is improving every day, searching on the web can be a tedious and time-consuming task due to the huge size and highly dynamic nature of the web. Moreover, the user's "intention behind the search" is not clearly expressed which results in too general, short queries. Results returned by search engine can count from hundreds to hundreds of thousands of documents. One approach to manage the large number of results is clustering. Search results clustering can be defined as a process of automatical grouping search results into to thematic groups. However, in contrast to traditional document clustering, clustering of search results are done on-the-fly (per user query request) and locally on a limited set of results return from the search engine. Clustering of search results can help user navigate through large set of documents more efficiently. By providing concise, accurate description of clusters, it lets user localizes interesting document faster.In this paper, we proposed an approach to search results clustering based on Tolerance Rough Set following the work on document clustering [4,3]. Tolerance classes are used to approximate concepts existed in documents. The application of Tolerance Rough Set model in document clustering was proposed as a way to enrich document and cluster representation with the hope of increasing clustering performance.