Improving quality of search results clustering with approximate matrix factorisations

Authors:
Stanislaw Osinski
Affiliations:
Poznan Supercomputing and Networking Center, Poznan, Poland
Venue:
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Year:
2006

Citing 12
Cited 9

Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
Reexamining the cluster hypothesis: scatter/gather on retrieval results

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Web document clustering: a feasibility demonstration

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Grouper: a dynamic clustering interface to Web search results

WWW '99 Proceedings of the eighth international conference on World Wide Web
Concept decompositions for large sparse text data using clustering

Machine Learning
Random projection in dimensionality reduction: applications to image and text data

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Document clustering based on non-negative matrix factorization

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Clustering web documents: a phrase-based method for grouping search engine results

Clustering web documents: a phrase-based method for grouping search engine results
A hierarchical monothetic document clustering algorithm for summarization and browsing search results

Proceedings of the 13th international conference on World Wide Web
Document clustering by concept factorization

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
A Concept-Driven Algorithm for Clustering Search Results

IEEE Intelligent Systems
Carrot2 and language properties in web search results clustering

AWIC'03 Proceedings of the 1st international Atlantic web intelligence conference on Advances in web intelligence

Document clustering using small world communities

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
A survey of Web clustering engines

ACM Computing Surveys (CSUR)
Web Snippet Clustering Based on Text Enrichment with Concept Hierarchy

ICONIP '09 Proceedings of the 16th International Conference on Neural Information Processing: Part II
Scalable clustering of news search results

Proceedings of the fourth ACM international conference on Web search and data mining
Beyond the bag-of-words paradigm to enhance information retrieval applications

Proceedings of the Fourth International Conference on SImilarity Search and APplications
Progress in information retrieval

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Association rule centric clustering of web search results

MIWAI'11 Proceedings of the 5th international conference on Multi-Disciplinary Trends in Artificial Intelligence
Related terms clustering for enhancing the comprehensibility of web search results

DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
Single document semantic spaces

AusDM '09 Proceedings of the Eighth Australasian Data Mining Conference - Volume 101

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we show how approximate matrix factorisations can be used to organise document summaries returned by a search engine into meaningful thematic categories. We compare four different factorisations (SVD, NMF, LNMF and K-Means/Concept Decomposition) with respect to topic separation capability, outlier detection and label quality. We also compare our approach with two other clustering algorithms: Suffix Tree Clustering (STC) and Tolerance Rough Set Clustering (TRC). For our experiments we use the standard merge-then-cluster approach based on the Open Directory Project web catalogue as a source of human-clustered document summaries.