An algorithm to cluster documents based on relevance

Authors:
Monica Desai;Amanda Spink
Affiliations:
Department of Computing Science and Engineering, The Pennsylvania State University, 220 Pond Laboratories, University Park, PA;School of Information Sciences, University of Pittsburgh, 610 IS Building, 135 N. Bellefield Avenue
Venue:
Information Processing and Management: an International Journal
Year:
2005

Citing 10
Cited 3

Relevance: communication and cognition

Relevance: communication and cognition
Recent trends in hierarchic document clustering: a critical review

Information Processing and Management: an International Journal
Relevance judgments for assessing recall

Information Processing and Management: an International Journal
Reexamining the cluster hypothesis: scatter/gather on retrieval results

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
From highly relevant to not relevant: examining different regions of relevance

Information Processing and Management: an International Journal
Regions and levels: measuring and mapping users' relevance judgments

Journal of the American Society for Information Science and Technology
Automatically combining ranking heuristics for HTML documents

Proceedings of the 3rd international workshop on Web information and data management
Liberal relevance criteria of TREC -: counting on negligible documents?

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Median measure: an approach to IR systems evaluation

Information Processing and Management: an International Journal
Using the structure of HTML documents to improve retrieval

USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems

A similarity-based method for retrieving documents from the SCI/SSCI database

Journal of Information Science
User rankings of search engine results

Journal of the American Society for Information Science and Technology
Context-based literature digital collection search

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Search engines fail to make a clear distinction between items of varying relevance when presenting search results to users. Instead, they rely on the user of the system to estimate which items are relevant, partially relevant, or not relevant. The user of the system is given the task of distinguishing between documents that are relevant to different degrees. This process often hinders the accessibility of relevant or partially relevant documents, particularly when the results set is large and documents of varying relevance are scattered throughout the set. In this paper, we present a clustering scheme that groups documents within relevant, partially relevant, and not relevant regions for a given search. A clustering algorithm accomplishes the task of clustering documents based on relevance. The clusters were evaluated by end-users issuing categorical, interval, and descriptive relevance judgments for the documents returned from a search. The degree of overlap between users and the system for each of the clustered regions was measured to determine the overall effectiveness of the algorithm. This research showed that clustering documents on the Web by regions of relevance is highly necessary and quite feasible.