Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Accelerated focused crawling through online relevance feedback
Proceedings of the 11th international conference on World Wide Web
Machine Learning
Modern Information Retrieval
Probabilistic combination of text classifiers using reliability indicators: models and results
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
QProber: A system for automatic classification of hidden-Web databases
ACM Transactions on Information Systems (TOIS)
Using Reinforcement Learning to Spider the Web Efficiently
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Focused Crawling Using Context Graphs
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Proceedings of the 27th International Conference on Very Large Data Bases
Automated discovery of search interfaces on the web
ADC '03 Proceedings of the 14th Australasian database conference - Volume 17
Statistical schema matching across web query interfaces
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Crawling for Domain-Speci.c Hidden Web Resources
WISE '03 Proceedings of the Fourth International Conference on Web Information Systems Engineering
An interactive clustering-based approach to integrating source query interfaces on the deep Web
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Organizing structured web sources by query schemas: a clustering approach
Proceedings of the thirteenth ACM international conference on Information and knowledge management
The Combination of Text Classifiers Using Reliability Indicators
Information Retrieval
Query Selection Techniques for Efficient Crawling of Structured Web Sources
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Data management projects at Google
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
A Design Principle for Coarse-to-Fine Classification
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Wise-integrator: an automatic integrator of web search interfaces for E-commerce
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
An adaptive crawler for locating hidden-Web entry points
Proceedings of the 16th international conference on World Wide Web
Organizing Structured Deep Web by Clustering Query Interfaces Link Graph
ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
Learning to extract form labels
Proceedings of the VLDB Endowment
BioRegistry: automatic extraction of metadata for biological database retrieval and discovery
Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
A hierarchical approach to model web query interfaces for web source integration
Proceedings of the VLDB Endowment
Stop word and related problems in web interface integration
Proceedings of the VLDB Endowment
Generation of Specifications Forms through Statistical Learning for a Universal Services Marketplace
WISE '09 Proceedings of the 10th International Conference on Web Information Systems Engineering
Automatically constructing a directory of molecular biology databases
DILS'07 Proceedings of the 4th international conference on Data integration in the life sciences
Creating and exploring web form repositories
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Collaborative identification and annotation of government deep web resources: a hybrid approach
Proceedings of the 21st ACM conference on Hypertext and hypermedia
BioRegistry: Automatic extraction of metadata for biological database retrieval and discovery
International Journal of Metadata, Semantics and Ontologies
On building a search interface discovery system
RED'09 Proceedings of the 2nd international conference on Resource discovery
Domain-independent classification for deep web interfaces
WAIM'10 Proceedings of the 11th international conference on Web-age information management
Searchable web sites recommendation
Proceedings of the fourth ACM international conference on Web search and data mining
Automatic identification of web query interfaces
MICAI'11 Proceedings of the 10th international conference on Artificial Intelligence: advances in Soft Computing - Volume Part II
OPAL: automated form understanding for the deep web
Proceedings of the 21st international conference on World Wide Web
Automatic discovery of Web Query Interfaces using machine learning techniques
Journal of Intelligent Information Systems
Understanding query interfaces by statistical parsing
ACM Transactions on the Web (TWEB)
The ontological key: automatically understanding and integrating forms to access the deep Web
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.00 |
We address the problem of identifying the domain of onlinedatabases. More precisely, given a set F of Web forms automaticallygathered by a focused crawler and an online databasedomain D, our goal is to select from F only the formsthat are entry points to databases in D. Having a set ofWebforms that serve as entry points to similar online databasesis a requirement for many applications and techniques thataim to extract and integrate hidden-Web information, suchas meta-searchers, online database directories, hidden-Webcrawlers, and form-schema matching and merging.We propose a new strategy that automatically and accuratelyclassifies online databases based on features that canbe easily extracted from Web forms. By judiciously partitioningthe space of form features, this strategy allows theuse of simpler classifiers that can be constructed using learningtechniques that are better suited for the features of eachpartition. Experiments using real Web data in a representativeset of domains show that the use of different classifiersleads to high accuracy, precision and recall. This indicatesthat our modular classifier composition provides an effectiveand scalable solution for classifying online databases.