Discovery of environmental nodes in the web

Authors:
Anastasia Moumtzidou;Stefanos Vrochidis;Sara Tonelli;Ioannis Kompatsiaris;Emanuele Pianta
Affiliations:
Informatics and Telematics Institute, Thessaloniki, Greece;Informatics and Telematics Institute, Thessaloniki, Greece;FBK, Trento, Italy;Informatics and Telematics Institute, Thessaloniki, Greece;FBK, Trento, Italy
Venue:
IRFC'12 Proceedings of the 5th conference on Multidisciplinary Information Retrieval
Year:
2012

Citing 12
Cited 0

A training algorithm for optimal margin classifiers

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Dynamic reference sifting: a case study in the homepage domain

Selected papers from the sixth international conference on World Wide Web
Focused crawling: a new approach to topic-specific Web resource discovery

WWW '99 Proceedings of the eighth international conference on World Wide Web
MetaSpider: meta-searching and categorization on the Web

Journal of the American Society for Information Science and Technology
Domain-Specific Web Search with Keyword Spices

IEEE Transactions on Knowledge and Data Engineering
An ontology-based approach to learnable focused crawling

Information Sciences: an International Journal
Ontology-Based Focused Crawling

EKNOW '09 Proceedings of the 2009 International Conference on Information, Process, and Knowledge Management
AQUAM: automatic query formulation architecture for mobile applications

Proceedings of the 7th International Conference on Mobile and Ubiquitous Multimedia
A machine learning approach to building domain-specific search engines

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Keyword spices: a new method for building domain-specific web search engines

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
KX: A flexible system for keyphrase extraction

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Analysis and processing of environmental information is considered of utmost importance for humanity. This article addresses the problem of discovery of web resources that provide environmental measurements. Towards the solution of this domain-specific search problem, we combine state-of-the-art search techniques together with advanced textual processing and supervised machine learning. Specifically, we generate domain-specific queries using empirical information and machine learning driven query expansion in order to enhance the initial queries with domain-specific terms. Multiple variations of these queries are submitted to a general-purpose web search engine in order to achieve a high recall performance and we employ a post processing module based on supervised machine learning to improve the precision of the final results. In this work, we focus on the discovery of weather forecast websites and we evaluate our technique by discovering weather nodes for south Finland.