Query routing for Web search engines: architectures and experiments
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Query-based sampling of text databases
ACM Transactions on Information Systems (TOIS)
Automatic information extraction from web pages
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Template detection via data mining and its applications
Proceedings of the 11th international conference on World Wide Web
Information Retrieval
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
QProber: A system for automatic classification of hidden-Web databases
ACM Transactions on Information Systems (TOIS)
Proceedings of the 27th International Conference on Very Large Data Bases
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
On the Automatic Extraction of Data from the Hidden Web
Revised Papers from the HUMACS, DASWIS, ECOMO, and DAMA on ER 2001 Workshops
Extracting structured data from Web pages
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Automatic Information Discovery from the "Invisible Web"
ITCC '02 Proceedings of the International Conference on Information Technology: Coding and Computing
Learning block importance models for web pages
Proceedings of the 13th international conference on World Wide Web
Automatic detection of fragments in dynamically generated web pages
Proceedings of the 13th international conference on World Wide Web
Automatic generation of agents for collecting hidden web pages for data extraction
Data & Knowledge Engineering - Special issue: WIDM 2002
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
A two-phase sampling technique for information extraction from hidden web databases
Proceedings of the 6th annual ACM international workshop on Web information and data management
Automatic extraction of informative blocks from webpages
Proceedings of the 2005 ACM symposium on Applied computing
Automatic Fragment Detection in Dynamic Web Pages and Its Impact on Caching
IEEE Transactions on Knowledge and Data Engineering
A TNATS approach to hidden web documents
ICDCIT'04 Proceedings of the First international conference on Distributed Computing and Internet Technology
Data & Knowledge Engineering
Privacy preservation of aggregates in hidden databases: why and how?
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Turbo-charging hidden database samplers with overflowing queries and skew reduction
Proceedings of the 13th International Conference on Extending Database Technology
Unbiased estimation of size and other aggregates over hidden web databases
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Just-in-time analytics on large file systems
FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
Attribute domain discovery for hidden web databases
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Rank discovery from web databases
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Hidden Web databases maintain a collection of specialised documents, which are dynamically generated using page templates. This paper presents the Two-Phase Sampling (2PS) technique that detects and extracts query-related information from documents contained in databases. 2PS is based on a two-phase framework for the sampling, information extraction and summarisation of Hidden Web documents. In the first phase, 2PS samples and stores documents for further analysis. In the second phase, it detects Web page templates from sampled documents and extracts relevant information from which a content summary is then generated. Experimental results demonstrate that 2PS effectively eliminates irrelevant information from sampled documents and generates terms and frequencies with improved accuracy.