Simple fast algorithms for the editing distance between trees and related problems
SIAM Journal on Computing
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Grouper: a dynamic clustering interface to Web search results
WWW '99 Proceedings of the eighth international conference on World Wide Web
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
An Efficient k-Means Clustering Algorithm: Analysis and Implementation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Mining the Web: Discovering Knowledge from HyperText Data
Mining the Web: Discovering Knowledge from HyperText Data
Mining topic-specific concepts and definitions on the web
WWW '03 Proceedings of the 12th international conference on World Wide Web
Automatic web news extraction using tree edit distance
Proceedings of the 13th international conference on World Wide Web
Learning to cluster web search results
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
The Anatomy of a Hierarchical Clustering Engine for Web-page, News and Book Snippets
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Automatically Generating an E-textbook on the Web
World Wide Web
Hi-index | 0.00 |
The abundance of knowledge-rich information on the World Wide Web makes compiling an online e-textbook both possible and necessary. The authors of [7] proposed an approach to automatically generate an e-textbook by mining the ranking lists of the search engine. However, the performance of the approach was degraded by Web pages that were relevant but not actually discussing the desired concept. In this paper, we extend the work in [7] by applying a clustering approach before the mining process. The clustering approach serves as a post-processing stage to the original results retrieved by the search engine, and aims to reach an optimum state in which all Web pages assigned to a concept are discussing that exact concept.