Simple and Fast: Improving a Branch-And-Bound Algorithm for Maximum Clique
ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
An efficient branch-and-bound algorithm for finding a maximum clique
DMTCS'03 Proceedings of the 4th international conference on Discrete mathematics and theoretical computer science
An overview of web data clustering practices
EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
An extended branch and bound search algorithm for finding top-N formal concepts of documents
JSAI'06 Proceedings of the 20th annual conference on New frontiers in artificial intelligence
A method for pinpoint clustering of web pages with pseudo-clique search
Proceedings of the 2005 international conference on Federation over the Web
Hi-index | 0.00 |
In this paper, we discuss a method of finding useful clusters of web pages which are significant in the sense that their contents are similar or closely related to ones of higher-ranked pages. Since we are usually careless of pages with lower ranks, they are unconditionally discarded even if their contents are similar to some pages with high ranks. We try to extract such hidden pages together with significant higher-ranked pages as a cluster. In order to obtain such clusters, we first extract semantic correlations among terms by applying Singular Value Decomposition(SVD) to the term-document matrix generated from a corpus w.r.t. a specific topic. Based on the correlations, we can evaluate potential similarities among web pages from which we try to obtain clusters. The set of web pages is represented as a weighted graph G based on the similarities and their ranks. Our clusters can be found as pseudo-cliques in G. We present an algorithm for finding Top-N weighted pseudo-cliques. Our experimental result shows that quite valuable clusters can be actually extracted according to our method.