CCReSD: concept-based categorisation of Hidden Web databases

Authors:
Yih-Ling Hedley;Muhammad Younas;Anne James
Affiliations:
Faculty of Engineering and Computing, Coventry University, Coventry CV1 5FB, UK.;Department of Computing, Oxford Brookes University, Oxford OX33 1HX, UK.;Faculty of Engineering and Computing, Coventry University, Coventry CV1 5FB, UK
Venue:
International Journal of High Performance Computing and Networking
Year:
2007

Citing 13
Cited 0

Query routing for Web search engines: architectures and experiments

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Query-based sampling of text databases

ACM Transactions on Information Systems (TOIS)
Automatic information extraction from web pages

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
QProber: A system for automatic classification of hidden-Web databases

ACM Transactions on Information Systems (TOIS)
RoadRunner: Towards Automatic Data Extraction from Large Web Sites

Proceedings of the 27th International Conference on Very Large Data Bases
On the Automatic Extraction of Data from the Hidden Web

Revised Papers from the HUMACS, DASWIS, ECOMO, and DAMA on ER 2001 Workshops
Extracting structured data from Web pages

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Automatic Information Discovery from the "Invisible Web"

ITCC '02 Proceedings of the International Conference on Information Technology: Coding and Computing
Probe, Cluster, and Discover: Focused Extraction of QA-Pagelets from the Deep Web

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Block-based web search

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
A two-phase sampling technique for information extraction from hidden web databases

Proceedings of the 6th annual ACM international workshop on Web information and data management
The Categorisation of Hidden Web Databases through Concept Specificity and Coverage

AINA '05 Proceedings of the 19th International Conference on Advanced Information Networking and Applications - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

Hidden Web databases dynamically generate results in response to users' queries. The categorisation of such databases into a category scheme has been widely employed in information searches. We present a Concept-based Categorisation over Refined Sampled Documents (CCReSD) approach that effectively handles information extraction, summarisation and categorisation of such databases. CCReSD detects and extracts query-related information from sampled documents of databases. It generates terms and frequencies to summarise database contents. It also generates descriptions of concepts from their coverage and specificity given in a category scheme. We conduct experiments to evaluate our approach and to show that it assigns databases with more relevant subject categories.