Inferring Web communities from link topology
Proceedings of the ninth ACM conference on Hypertext and hypermedia : links, objects, time and space---structure in hypermedia systems: links, objects, time and space---structure in hypermedia systems
Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Efficient crawling through URL ordering
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Summary of WWW characterizations
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Towards a better understanding of Web resources and server responses for improved caching
WWW '99 Proceedings of the eighth international conference on World Wide Web
Mirror, mirror on the Web: a study of host pairs with replicated content
WWW '99 Proceedings of the eighth international conference on World Wide Web
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Efficient identification of Web communities
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Aliasing on the world wide web: prevalence and performance implications
Proceedings of the 11th international conference on World Wide Web
Mercator: A scalable, extensible Web crawler
World Wide Web
The decay and failures of web references
Communications of the ACM
The Evolution of the Web and Implications for an Incremental Crawler
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Finding Near-Replicas of Documents and Servers on the Web
WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Who Links to Whom: Mining Linkage between Web Sites
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
A large-scale study of the evolution of web pages
WWW '03 Proceedings of the 12th international conference on World Wide Web
GTrace - A Graphical Traceroute Tool
LISA '99 Proceedings of the 13th USENIX conference on System administration
Rate of change and other metrics: a live study of the world wide web
USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems
Managing duplicates in a web archive
Proceedings of the 2006 ACM symposium on Applied computing
Modelling information persistence on the web
ICWE '06 Proceedings of the 6th international conference on Web engineering
Evaluation of crawling policies for a web-repository crawler
Proceedings of the seventeenth conference on Hypertext and hypermedia
Characterization of national Web domains
ACM Transactions on Internet Technology (TOIT)
The Viúva Negra crawler: an experience report
Software—Practice & Experience
Using a fuzzy classification approach to assess e-commerce Web sites: An empirical investigation
ACM Transactions on Internet Technology (TOIT)
How are web characteristics evolving?
Proceedings of the 20th ACM conference on Hypertext and hypermedia
Foundations and Trends in Information Retrieval
Sampling the national deep web
DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part I
Databases on the web: national web domain survey
Proceedings of the 15th Symposium on International Database Engineering & Applications
CLEF'05 Proceedings of the 6th international conference on Cross-Language Evalution Forum: accessing Multilingual Information Repositories
Design and selection criteria for a national web archive
ECDL'06 Proceedings of the 10th European conference on Research and Advanced Technology for Digital Libraries
Question answering beyond CLEF document collections
CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval
Hi-index | 0.00 |
This article presents a characterization of the community Web of the people of Portugal. We defined criteria for delimiting this Web based on our past experience of crawling pages related to Portugal and collected over 3.2 million documents from 46,000 sites satisfying those criteria. Our characterization was derived from this crawl. We describe the rules that we established for defining the boundaries of this community Web and the methodology used to gather statistics. Statistics cover the number and domain distribution of sites; the number, type and size distribution of text documents; and the linkage structure of this Web. We also show how crawling constraints and abnormal situations on the Web can influence the statistics.