Accelerated focused crawling through online relevance feedback
Proceedings of the 11th international conference on World Wide Web
Proceedings of the 16th international conference on World Wide Web
A constrained crawling approach and its application to a specialised search engine
International Journal of Information and Communication Technology
Hi-index | 0.00 |
We present a novel focused crawling method for extracting and processing cultural data from the web in a fully automated fashion. After downloading the pages, we extract from each document a number of words for each thematic cultural area. We then create multidimensional document vectors comprising the most frequent word occurrences. The dissimilarity between these vectors is measured by the Hamming distance. In the last stage, we employ cluster analysis to partition the document vectors into a number of clusters. Finally, our approach is illustrated via a proof-of-concept application which scrutinizes hundreds of web pages spanning different cultural thematic areas.