Automatic classification of web databases using domain-dictionaries

Authors:
Heidy M. Marin-Castro;Victor J. Sosa-Sosa;Ivan Lopez-Arevalo;Hugo Jair Escalante-Baldera
Affiliations:
Center of Research and Advanced Studies of the National Polytechnic Institute, Information Technology Laboratory, Victoria City, Tamaulipas, Mexico;Center of Research and Advanced Studies of the National Polytechnic Institute, Information Technology Laboratory, Victoria City, Tamaulipas, Mexico;Center of Research and Advanced Studies of the National Polytechnic Institute, Information Technology Laboratory, Victoria City, Tamaulipas, Mexico;National Institute for Astrophysics, Optics and Electronics, Tonantzintla, Puebla, Mexico
Venue:
MLDM'13 Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition
Year:
2013

Citing 9
Cited 0

Organizing structured web sources by query schemas: a clustering approach

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Distributional term representations: an experimental comparison

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Clustering e-commerce search engines based on their search interface pages using WISE-cluster

Data & Knowledge Engineering - Special issue: WIDM 2004
Introduction to Information Retrieval

Introduction to Information Retrieval
Learning Deep Web Crawling with Diverse Features

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Stop word and related problems in web interface integration

Proceedings of the VLDB Endowment
Web database schema identification through simple query interface

RED'09 Proceedings of the 2nd international conference on Resource discovery
Efficient deep web crawling using reinforcement learning

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Automatic discovery of Web Query Interfaces using machine learning techniques

Journal of Intelligent Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The identification, classification and integration of databases on the Web (also called web databases) as information sources is still a great challenge due to their constantly growing and diversification. The classification of such web databases according to their application domain is an important step towards the integration of deep web sources. Moreover, given the design and content heterogeneity that exists among the different web databases, their automatic classification become a great challenge and a highly demanded task, requiring techniques that allow to cluster web databases according to the domains they belong to. In this paper we present a strategy for automatic classification of web databases based on a new supervised approach. This strategy uses the visible information available on a group of specific-domain Web Query Interfaces (WQIs) to construct a dictionary or lexicon that will allow to better describe a particular domain of interest. The dictionary is enriched with synonyms. In our experiments, the dictionary was built from a set of randomly selected specific-domain WQIs. The automatic WQI classification based on dictionaries generated in this way showed efficient and competitive results compared against related work.