On estimating the scale of national deep web

Authors:
Denis Shestakov;Tapio Salakoski
Affiliations:
Turku Centre for Computer Science, University of Turku, Turku, Finland;Turku Centre for Computer Science, University of Turku, Turku, Finland
Venue:
DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
Year:
2007

Citing 3
Cited 5

Structured databases on the web: observations and implications

ACM SIGMOD Record
Characterization of national Web domains

ACM Transactions on Internet Technology (TOIT)
Accessing the deep web

Communications of the ACM - ACM at sixty: a look back in time

Host-IP clustering technique for deep web characterization

Proceedings of the 2010 ACM Symposium on Applied Computing
Understanding deep web search interfaces: a survey

ACM SIGMOD Record
On building a search interface discovery system

RED'09 Proceedings of the 2nd international conference on Resource discovery
Sampling the national deep web

DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part I
Databases on the web: national web domain survey

Proceedings of the 15th Symposium on International Database Engineering & Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the advances in web technologies, more and more information on the Web is contained in dynamically-generated web pages. Among several types of web "dynamism" the most important one is the case when web pages are generated as results of queries submitted via search web forms to databases available online. These pages constitute the portion of the Web known as deep Web. The existing estimates of the deep Web are predominantly based on study of English deep web sites. The key parameters of other-than-English segments of the deep Web were not investigated so far. Thus, currently known characteristics of the deep Web may be biased, especially owing to a steady increase in non-English web content. In this paper, we survey the part of the deep Web consisting of dynamic pages in one particular national domain. The estimation of the national deep Web is performed using the proposed sampling techniques. We report our observations and findings based on the experiments conducted in summer 2005.