EATIS '07 Proceedings of the 2007 Euro American conference on Telematics and information systems
Statistical approach for improving the quality of search results
ACACOS'11 Proceedings of the 10th WSEAS international conference on Applied computer and applied computational science
Web data mining trends and techniques
Proceedings of the International Conference on Advances in Computing, Communications and Informatics
Hi-index | 0.00 |
Typically, search engines are low precision in response to a query, retrieving lots of useless web pages, and missing some other important ones. In this paper, we study the problem of the hierarchical clustering of web pages search results. In particular, we propose an architecture called WISE [1], a meta-search engine that automatically builds clusters of related web pages embodying one meaning of the query. These clusters are then hierarchically organized and labeled with a phrase representing the key concept of the cluster and the corresponding web documents. The system which is a web-based interface (soon available at wise.di.ubi.pt), introduces some interesting new ideas, such as the pre-selection of the retrieved web pages, the capacity to statistically detect phrases within documents and the representation of documents based on their most relevant key concepts by using web content mining techniques. The final step of the system is supported by a graph-based overlapping clustering algorithm which groups the selected documents into a hierarchy of clusters.