A Clustering Framework to Build Focused Web Crawlers for Automatic Extraction of Cultural Information

Authors:
George E. Tsekouras;Damianos Gavalas;Stefanos Filios;Antonios D. Niros;George Bafaloukas
Affiliations:
Department of Cultural Technology and Communication, University of the Aegean, Lesvos, Greece 81100;Department of Cultural Technology and Communication, University of the Aegean, Lesvos, Greece 81100;Department of Cultural Technology and Communication, University of the Aegean, Lesvos, Greece 81100;Department of Cultural Technology and Communication, University of the Aegean, Lesvos, Greece 81100;Department of Cultural Technology and Communication, University of the Aegean, Lesvos, Greece 81100
Venue:
SETN '08 Proceedings of the 5th Hellenic conference on Artificial Intelligence: Theories, Models and Applications
Year:
2008

Citing 2
Cited 1

Accelerated focused crawling through online relevance feedback

Proceedings of the 11th international conference on World Wide Web
First-order focused crawling

Proceedings of the 16th international conference on World Wide Web

A constrained crawling approach and its application to a specialised search engine

International Journal of Information and Communication Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a novel focused crawling method for extracting and processing cultural data from the web in a fully automated fashion. After downloading the pages, we extract from each document a number of words for each thematic cultural area. We then create multidimensional document vectors comprising the most frequent word occurrences. The dissimilarity between these vectors is measured by the Hamming distance. In the last stage, we employ cluster analysis to partition the document vectors into a number of clusters. Finally, our approach is illustrated via a proof-of-concept application which scrutinizes hundreds of web pages spanning different cultural thematic areas.