Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Automated learning of decision rules for text categorization
ACM Transactions on Information Systems (TOIS)
A comparison of classifiers and document representations for the routing problem
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Training algorithms for linear text classifiers
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Learning to Understand Information on the Internet: AnExample-Based Approach
Journal of Intelligent Information Systems - Special issue: next generation information technologies and systems
The SMART and SIRE experimental retrieval systems
Readings in information retrieval
Inductive learning algorithms and representations for text categorization
Proceedings of the seventh international conference on Information and knowledge management
Effective retrieval with distributed collections
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Methods for information server selection
ACM Transactions on Information Systems (TOIS)
Automatic discovery of language models for text databases
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
GlOSS: text-source discovery over the Internet
ACM Transactions on Database Systems (TODS)
Server selection on the World Wide Web
DL '00 Proceedings of the fifth ACM conference on Digital libraries
Query routing for Web search engines: architectures and experiments
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Information Retrieval
Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Hierarchically Classifying Documents Using Very Few Words
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Determining Text Databases to Search in the Internet
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Automatic Classification of Text Databases Through Query Probing
Selected papers from the Third International Workshop WebDB 2000 on The World Wide Web and Databases
Detection of Heterogeneities in a Multiple Text Database Environment
COOPIS '99 Proceedings of the Fourth IECIS International Conference on Cooperative Information Systems
Concept Hierarchy Based Text Database Categorization in a Metasearch Engine Environment
WISE '00 Proceedings of the First International Conference on Web Information Systems Engineering (WISE'00)-Volume 1 - Volume 1
Learning trees and rules with set-valued features
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
SDLIP + STARTS = SDARTS a protocol and toolkit for metasearching
Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
PERSIVAL, a system for personalized search and summarization over multimedia healthcare information
Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
PERSIVAL demo: categorizing hidden-web resources
Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
A highly scalable and effective method for metasearch
ACM Transactions on Information Systems (TOIS)
Mining source coverage statistics for data integration
Proceedings of the 3rd international workshop on Web information and data management
Building efficient and effective metasearch engines
ACM Computing Surveys (CSUR)
Personalized web search by mapping user queries to categories
Proceedings of the eleventh international conference on Information and knowledge management
Mining coverage statistics for websource selection in a mediator
Proceedings of the eleventh international conference on Information and knowledge management
Text Retrieval Systems for the Web
Programming and Computing Software
QProber: A system for automatic classification of hidden-Web databases
ACM Transactions on Information Systems (TOIS)
Object-Extraction-Based Hidden Web Information Retrieval
WAIM '02 Proceedings of the Third International Conference on Advances in Web-Age Information Management
WWW '03 Proceedings of the 12th international conference on World Wide Web
Automated discovery of search interfaces on the web
ADC '03 Proceedings of the 14th Australasian database conference - Volume 17
The Web-DL environment for building digital libraries from the Web
Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Personalized Web Search For Improving Retrieval Effectiveness
IEEE Transactions on Knowledge and Data Engineering
Introduction to the special issue on the web as corpus
Computational Linguistics - Special issue on web as corpus
Learning query languages of Web interfaces
Proceedings of the 2004 ACM symposium on Applied computing
Knocking the door to the deep Web: integrating Web query interfaces
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Mining complex matchings across Web query interfaces
Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Clustering e-commerce search engines
Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
Discovering complex matchings across web query interfaces: a correlation mining approach
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Organizing structured web sources by query schemas: a clustering approach
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Structured databases on the web: observations and implications
ACM SIGMOD Record
Discovering and ranking web services with BASIL: a personalized approach with biased focus
Proceedings of the 2nd international conference on Service oriented computing
Downloading textual hidden web content through keyword queries
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Two-stage statistical language models for text database selection
Information Retrieval
Automatic complex schema matching across Web query interfaces: A correlation mining approach
ACM Transactions on Database Systems (TODS)
Capturing collection size for distributed non-cooperative retrieval
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Distributed query sampling: a quality-conscious approach
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient, automatic web resource harvesting
WIDM '06 Proceedings of the 8th annual ACM international workshop on Web information and data management
Identifying redundant search engines in a very large scale metasearch engine context
WIDM '06 Proceedings of the 8th annual ACM international workshop on Web information and data management
Clustering e-commerce search engines based on their search interface pages using WISE-cluster
Data & Knowledge Engineering - Special issue: WIDM 2004
A random walk approach to sampling hidden databases
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Distributed search over the hidden web: hierarchical database sampling and selection
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Automatic Hidden Web Database Classification
PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Efficient Top-k Data Sources Ranking for Query on Deep Web
WISE '08 Proceedings of the 9th international conference on Web Information Systems Engineering
Facilitating discovery on the private web using dataset digests
Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services
Automatic hidden-web table interpretation, conceptualization, and semantic annotation
Data & Knowledge Engineering
Estimating deep web data source size by capture---recapture method
Information Retrieval
Processing queries in a large peer-to-peer system
CAiSE'03 Proceedings of the 15th international conference on Advanced information systems engineering
Automatic hidden-web table interpretation by sibling page comparison
ER'07 Proceedings of the 26th international conference on Conceptual modeling
Automated data discovery in similarity score queries
DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
Ranking bias in deep web size estimation using capture recapture method
Data & Knowledge Engineering
Facilitating discovery on the private web using dataset digests
International Journal of Metadata, Semantics and Ontologies
Web database schema identification through simple query interface
RED'09 Proceedings of the 2nd international conference on Resource discovery
Batch query processing for web search engines
Proceedings of the fourth ACM international conference on Web search and data mining
KES'11 Proceedings of the 15th international conference on Knowledge-based and intelligent information and engineering systems - Volume Part I
Automatic hierarchical classification of structured deep web databases
WISE'06 Proceedings of the 7th international conference on Web Information Systems
Automated extraction of hit numbers from search result pages
WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
Clustering structured web sources: a schema-based, model-differentiation approach
EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
Functional composition of web databases
ICADL'06 Proceedings of the 9th international conference on Asian Digital Libraries: achievements, Challenges and Opportunities
Editorial: Occupation inference through detection and classification of biographical activities
Data & Knowledge Engineering
Topic-Sensitive hidden-web crawling
WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Hi-index | 0.00 |
The contents of many valuable web-accessible databases are only accessible through search interfaces and are hence invisible to traditional web “crawlers.” Recent studies have estimated the size of this “hidden web” to be 500 billion pages, while the size of the “crawlable” web is only an estimated two billion pages. Recently, commercial web sites have started to manually organize web-accessible databases into Yahoo!-like hierarchical classification schemes. In this paper, we introduce a method for automating this classification process by using a small number of query probes. To classify a database, our algorithm does not retrieve or inspect any documents or pages from the database, but rather just exploits the number of matches that each query probe generates at the database in question. We have conducted an extensive experimental evaluation of our technique over collections of real documents, including over one hundred web-accessible databases. Our experiments show that our system has low overhead and achieves high classification accuracy across a variety of databases.