Probe, count, and classify: categorizing hidden web databases
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
A highly scalable and effective method for metasearch
ACM Transactions on Information Systems (TOIS)
Mining source coverage statistics for data integration
Proceedings of the 3rd international workshop on Web information and data management
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
QProber: A system for automatic classification of hidden-Web databases
ACM Transactions on Information Systems (TOIS)
ACM Transactions on Information Systems (TOIS)
Topic distillation using hierarchy concept tree
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Learning query languages of Web interfaces
Proceedings of the 2004 ACM symposium on Applied computing
A Frequency-based Approach for Mining Coverage Statistics in Data Integration
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Discovering and ranking web services with BASIL: a personalized approach with biased focus
Proceedings of the 2nd international conference on Service oriented computing
Effectively Mining and Using Coverage and Overlap Statistics for Data Integration
IEEE Transactions on Knowledge and Data Engineering
Information source selection for resource constrained environments
ACM SIGMOD Record
Categorizing web search results into meaningful and stable categories using fast-feature techniques
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Distributed query sampling: a quality-conscious approach
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Advertising keyword suggestion based on concept hierarchy
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Web database schema identification through simple query interface
RED'09 Proceedings of the 2nd international conference on Resource discovery
Automatic hierarchical classification of structured deep web databases
WISE'06 Proceedings of the 7th international conference on Web Information Systems
Web access path prediction using fuzzy case based reasoning
KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part III
Hi-index | 0.01 |
Document categorization as a technique to improve the retrieval of useful documents has been extensively investigated. One important issue in a large-scale metasearch engine is to select text databases that are likely to contain useful documents for a given query. We believe that database categorization can be a potentially effective technique for good database selection, especially in the Internet environment where short queries are usually submitted. In this paper, we propose and evaluate several database categorization algorithms. This study indicates that while some document categorization algorithms could be adopted for database categorization, algorithms that take into consideration the special characteristics of databases may be more effective. Preliminary experimental results are provided to compare the proposed database categorization algorithms.