Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
The mathematical foundations of learning machines
The mathematical foundations of learning machines
C4.5: programs for machine learning
C4.5: programs for machine learning
Automated learning of decision rules for text categorization
ACM Transactions on Information Systems (TOIS)
A comparison of classifiers and document representations for the routing problem
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Training algorithms for linear text classifiers
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Learning to Understand Information on the Internet: AnExample-Based Approach
Journal of Intelligent Information Systems - Special issue: next generation information technologies and systems
Wrappers for feature subset selection
Artificial Intelligence - Special issue on relevance
The SMART and SIRE experimental retrieval systems
Readings in information retrieval
Inductive learning algorithms and representations for text categorization
Proceedings of the seventh international conference on Information and knowledge management
Machine Learning - Special issue on applications of machine learning and the knowledge discovery process
Effective retrieval with distributed collections
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Methods for information server selection
ACM Transactions on Information Systems (TOIS)
Automatic discovery of language models for text databases
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Scalable collection summarization and selection
Proceedings of the fourth ACM conference on Digital libraries
GlOSS: text-source discovery over the Internet
ACM Transactions on Database Systems (TODS)
Server selection on the World Wide Web
DL '00 Proceedings of the fifth ACM conference on Digital libraries
Snowball: extracting relations from large plain-text collections
DL '00 Proceedings of the fifth ACM conference on Digital libraries
Query routing for Web search engines: architectures and experiments
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Probe, count, and classify: categorizing hidden web databases
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
PERSIVAL demo: categorizing hidden-web resources
Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
Query-based sampling of text databases
ACM Transactions on Information Systems (TOIS)
Mining the web to create minority language corpora
Proceedings of the tenth international conference on Information and knowledge management
Extracting query modifications from nonlinear SVMs
Proceedings of the 11th international conference on World Wide Web
Information Retrieval
Machine Learning
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
A Tutorial on Support Vector Machines for Pattern Recognition
Data Mining and Knowledge Discovery
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Hierarchically Classifying Documents Using Very Few Words
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Determining Text Databases to Search in the Internet
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Proceedings of the 27th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Detection of Heterogeneities in a Multiple Text Database Environment
COOPIS '99 Proceedings of the Fourth IECIS International Conference on Cooperative Information Systems
Concept Hierarchy Based Text Database Categorization in a Metasearch Engine Environment
WISE '00 Proceedings of the First International Conference on Web Information Systems Engineering (WISE'00)-Volume 1 - Volume 1
Extracting comprehensible models from trained neural networks
Extracting comprehensible models from trained neural networks
Using machine learning to improve information access
Using machine learning to improve information access
Distributed search over the hidden web: hierarchical database sampling and selection
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Learning trees and rules with set-valued features
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Probe, Cluster, and Discover: Focused Extraction of QA-Pagelets from the Deep Web
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
When one sample is not enough: improving text database selection using shrinkage
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Query-related data extraction of hidden web documents
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Unified utility maximization framework for resource selection
Proceedings of the thirteenth ACM international conference on Information and knowledge management
A two-phase sampling technique for information extraction from hidden web databases
Proceedings of the 6th annual ACM international workshop on Web information and data management
Discovering and ranking web services with BASIL: a personalized approach with biased focus
Proceedings of the 2nd international conference on Service oriented computing
Guiding queries to information sources with InfoBeacons
Proceedings of the 5th ACM/IFIP/USENIX international conference on Middleware
Modeling and Managing Content Changes in Text Databases
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Modeling search engine effectiveness for federated search
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
QA-Pagelet: Data Preparation Techniques for Large-Scale Data Analysis of the Deep Web
IEEE Transactions on Knowledge and Data Engineering
Information source selection for resource constrained environments
ACM SIGMOD Record
Two-stage statistical language models for text database selection
Information Retrieval
Categorizing web search results into meaningful and stable categories using fast-feature techniques
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
To search or to crawl?: towards a query optimizer for text-centric tasks
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Clustering web images using association rules, interestingness measures, and hypergraph partitions
ICWE '06 Proceedings of the 6th international conference on Web engineering
Social networks, incentives, and search
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Capturing collection size for distributed non-cooperative retrieval
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Sampling, information extraction and summarisation of hidden web databases
Data & Knowledge Engineering - Special issue: WIDM 2004
Combining classifiers to identify online databases
Proceedings of the 16th international conference on World Wide Web
Modeling and managing changes in text databases
ACM Transactions on Database Systems (TODS)
Using query logs to establish vocabularies in distributed information retrieval
Information Processing and Management: an International Journal
Federated text retrieval from uncooperative overlapped collections
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Updating collection representations for federated search
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
DeepBot: a focused crawler for accessing hidden web content
Proceedings of the 3rd international workshop on Data enginering issues in E-commerce and services: In conjunction with ACM Conference on Electronic Commerce (EC '07)
DejaView: a personal virtual computer recorder
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Classification-aware hidden-web text database selection
ACM Transactions on Information Systems (TOIS)
CCReSD: concept-based categorisation of Hidden Web databases
International Journal of High Performance Computing and Networking
Mining world knowledge for analysis of search engine content
Web Intelligence and Agent Systems
Automatic Hidden Web Database Classification
PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Efficient Top-k Data Sources Ranking for Query on Deep Web
WISE '08 Proceedings of the 9th international conference on Web Information Systems Engineering
Proceedings of the VLDB Endowment
Robust result merging using sample-based score estimates
ACM Transactions on Information Systems (TOIS)
A Topic-Based Measure of Resource Description Quality for Distributed Information Retrieval
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
ODE: Ontology-assisted data extraction
ACM Transactions on Database Systems (TODS)
Effective query expansion for federated search
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Kosmix: high-performance topic exploration using the deep web
Proceedings of the VLDB Endowment
Commercial Internet filters: Perils and opportunities
Decision Support Systems
Record extraction based on user feedback and document selection
APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
Schema mapping in p2p networks based on classification and probing
DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Finding and using the content texts of HTML pages
AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Crawling the content hidden behind web forms
ICCSA'07 Proceedings of the 2007 international conference on Computational science and Its applications - Volume Part II
Data aspects in a relational database
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
On building a search interface discovery system
RED'09 Proceedings of the 2nd international conference on Resource discovery
Aspect-oriented relational algebra
Proceedings of the 14th International Conference on Extending Database Technology
Foundations and Trends in Information Retrieval
A multi-collection latent topic model for federated search
Information Retrieval
Automatic hierarchical classification of structured deep web databases
WISE'06 Proceedings of the 7th international conference on Web Information Systems
Sample sizes for query probing in uncooperative distributed information retrieval
APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
A TNATS approach to hidden web documents
ICDCIT'04 Proceedings of the First international conference on Distributed Computing and Internet Technology
Supporting data aspects in pig latin
Proceedings of the 12th annual international conference on Aspect-oriented software development
Assessing relevance and trust of the deep web sources and results based on inter-source agreement
ACM Transactions on the Web (TWEB)
Towards social data platform: automatic topic-focused monitor for twitter stream
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
The contents of many valuable Web-accessible databases are only available through search interfaces and are hence invisible to traditional Web "crawlers." Recently, commercial Web sites have started to manually organize Web-accessible databases into Yahoo!-like hierarchical classification schemes. Here we introduce QProber, a modular system that automates this classification process by using a small number of query probes, generated by document classifiers. QProber can use a variety of types of classifiers to generate the probes. To classify a database, QProber does not retrieve or inspect any documents or pages from the database, but rather just exploits the number of matches that each query probe generates at the database in question. We have conducted an extensive experimental evaluation of QProber over collections of real documents, experimenting with different types of document classifiers and retrieval models. We have also tested our system with over one hundred Web-accessible databases. Our experiments show that our system has low overhead and achieves high classification accuracy across a variety of databases.