Copy detection mechanisms for digital documents
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Life, death, and lawfulness on the electronic frontier
Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Synchronizing a database to improve freshness
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Mining the Web's Link Structure
Computer
The Evolution of the Web and Implications for an Incremental Crawler
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Approximating Aggregate Queries about Web Pages via Random Walks
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Dynamic maintenance of web indexes using landmarks
WWW '03 Proceedings of the 12th international conference on World Wide Web
A large-scale study of the evolution of web pages
WWW '03 Proceedings of the 12th international conference on World Wide Web
Stochastic models for the Web graph
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Rate of change and other metrics: a live study of the world wide web
USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems
Sic transit gloria telae: towards an understanding of the web's decay
Proceedings of the 13th international conference on World Wide Web
On integrating web services from the ground up into CS1/CS2
Proceedings of the 36th SIGCSE technical symposium on Computer science education
Modeling and Managing Content Changes in Text Databases
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
WWW '05 Proceedings of the 14th international conference on World Wide Web
The volume and evolution of web page templates
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Crawling a country: better strategies than breadth-first for web page ordering
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
The WT10G dataset and the evolution of the web
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Downloading textual hidden web content through keyword queries
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Search Adaptations and the Challenges of the Web
IEEE Internet Computing
Graphs over time: densification laws, shrinking diameters and possible explanations
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Analyzing history in hypermedia collections
Proceedings of the sixteenth ACM conference on Hypertext and hypermedia
Shuffling a stacked deck: the case for partially randomized ranking of search engine results
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Evolution of web site design patterns
ACM Transactions on Information Systems (TOIS)
DirectoryRank: ordering pages in web directories
Proceedings of the 7th annual ACM international workshop on Web information and data management
The freshness of web search engine databases
Journal of Information Science
What's really new on the web?: identifying new pages from a series of unstable web snapshots
Proceedings of the 15th international conference on World Wide Web
Detecting semantic cloaking on the web
Proceedings of the 15th international conference on World Wide Web
BuzzRank … and the trend is your friend
Proceedings of the 15th international conference on World Wide Web
Modelling information persistence on the web
ICWE '06 Proceedings of the 6th international conference on Web engineering
Dynamic test collections: measuring search effectiveness on the live web
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Web dynamics and their ramifications for the development of web search engines
Computer Networks: The International Journal of Computer and Telecommunications Networking - Web dynamics
Structure and evolution of online social networks
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
A multifaceted approach to understanding the botnet phenomenon
Proceedings of the 6th ACM SIGCOMM conference on Internet measurement
Efficient, automatic web resource harvesting
WIDM '06 Proceedings of the 8th annual ACM international workshop on Web information and data management
Lazy preservation: reconstructing websites by crawling the crawlers
WIDM '06 Proceedings of the 8th annual ACM international workshop on Web information and data management
Eigen-trend: trend analysis in the blogosphere based on singular value decompositions
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Graph evolution: Densification and shrinking diameters
ACM Transactions on Knowledge Discovery from Data (TKDD)
An exploration of the principles underlying redundancy-based factoid question answering
ACM Transactions on Information Systems (TOIS)
Automatic classification of Web queries using very large unlabeled query logs
ACM Transactions on Information Systems (TOIS)
Web searching, search engines and Information Retrieval
Information Services and Use
The discoverability of the web
Proceedings of the 16th international conference on World Wide Web
Mirror site maintenance based on evolution associations of web directories
Proceedings of the 16th international conference on World Wide Web
Temporal Analysis of the Wikigraph
WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Factors affecting website reconstruction from the web infrastructure
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Modeling and managing changes in text databases
ACM Transactions on Database Systems (TODS)
Pruning policies for two-tiered inverted index with correctness guarantee
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Dynamic role allocation for small search engine clusters
Proceedings of the 2007 annual research conference of the South African institute of computer scientists and information technologists on IT research in developing countries
Repeatable evaluation of search services in dynamic environments
ACM Transactions on Information Systems (TOIS)
The web is smaller than it seems
Proceedings of the 7th ACM SIGCOMM conference on Internet measurement
Computer
Using neighbors to date web documents
Proceedings of the 9th annual ACM international workshop on Web information and data management
Designing clustering-based web crawling policies for search engine crawlers
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
A personalized search engine based on Web-snippet hierarchical clustering
Software—Practice & Experience
DistanceRank: An intelligent ranking algorithm for web pages
Information Processing and Management: an International Journal
A new aggregation policy for RSS services
Proceedings of the 2008 international workshop on Context enabled source and service selection, integration and adaptation: organized with the 17th International World Wide Web Conference (WWW 2008)
Genealogical trees on the web: a search engine user perspective
Proceedings of the 17th international conference on World Wide Web
Recrawl scheduling based on information longevity
Proceedings of the 17th international conference on World Wide Web
Discovering co-located queries in geographic search logs
Proceedings of the first international workshop on Location and the web
Microscopic evolution of social networks
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Estimating the Change of Web Pages
ICCS '07 Proceedings of the 7th international conference on Computational Science, Part III: ICCS 2007
Representing and Quantifying Rank - Change for the Web Graph
Algorithms and Models for the Web-Graph
A Quantitative Evaluation of Dissemination-Time Preservation Metadata
ECDL '08 Proceedings of the 12th European conference on Research and Advanced Technology for Digital Libraries
Maintaining dynamic channel profiles on the web
Proceedings of the VLDB Endowment
Characterization of the evolution of a news Web site
Journal of Systems and Software
Parallel crawler architecture and web page change detection
WSEAS Transactions on Computers
User language model for collaborative personalized search
ACM Transactions on Information Systems (TOIS)
A three-year study on the freshness of web search engine databases
Journal of Information Science
The web changes everything: understanding the dynamics of web content
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Resonance on the web: web dynamics and revisitation patterns
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Sitemaps: above and beyond the crawl of duty
Proceedings of the 18th international conference on World wide web
A Study of the Impact of Index Updates on Distributed Query Processing for Web Search
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
EverLast: a distributed architecture for preserving the web
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
The impact of crawl policy on web search effectiveness
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
A practical method for browsing a relational database using a standard search engine
Integrated Computer-Aided Engineering - Selected papers from the IEEE Conference on Information Reuse and Integration (IRI), July 13-15, 2008
A method for measuring the evolution of a topic on the Web: The case of “informetrics”
Journal of the American Society for Information Science and Technology
Changing how people view changes on the web
Proceedings of the 22nd annual ACM symposium on User interface software and technology
Proceedings of the 2009 ACM workshop on Cloud computing security
FICA: A novel intelligent crawling algorithm based on reinforcement learning
Web Intelligence and Agent Systems
SHARC: framework for quality-conscious web archiving
Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment
Electronic Notes in Theoretical Computer Science (ENTCS)
Leveraging temporal dynamics of document content in relevance ranking
Proceedings of the third ACM international conference on Web search and data mining
Foundations and Trends in Information Retrieval
Computer Networks: The International Journal of Computer and Telecommunications Networking
The adaptive web
Promotional ranking of search engine results: giving new web pages a chance to prove their values
APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
Efficiently detecting webpage updates using samples
ICWE'07 Proceedings of the 7th international conference on Web engineering
Towards improving web search by utilizing social bookmarks
ICWE'07 Proceedings of the 7th international conference on Web engineering
SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Graph structure of the Korea web
DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Analysis of web search engine query session and clicked documents
WebKDD'06 Proceedings of the 8th Knowledge discovery on the web international conference on Advances in web mining and web usage analysis
Understanding content reuse on the web: static and dynamic analyses
WebKDD'06 Proceedings of the 8th Knowledge discovery on the web international conference on Advances in web mining and web usage analysis
Peeking through the cloud: DNS-based estimation and its applications
ACNS'08 Proceedings of the 6th international conference on Applied cryptography and network security
On popularity quality: growth and decay phases of publication popularities
IIT'09 Proceedings of the 6th international conference on Innovations in information technology
Peeking Through the Cloud: Client Density Estimation via DNS Cache Probing
ACM Transactions on Internet Technology (TOIT)
A Framework for Large-Scale Detection of Web Site Defacements
ACM Transactions on Internet Technology (TOIT)
Clustering-based incremental web crawling
ACM Transactions on Information Systems (TOIS)
Exploratory Analysis of Collaborative Web Accessibility Improvement
ACM Transactions on Accessible Computing (TACCESS)
Term frequency dynamics in collaborative articles
Proceedings of the 10th ACM symposium on Document engineering
A cost-continuity model for web search
MDAI'10 Proceedings of the 7th international conference on Modeling decisions for artificial intelligence
A co-operative web services paradigm for supporting crawlers
Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Understanding temporal query dynamics
Proceedings of the fourth ACM international conference on Web search and data mining
The SHARC framework for data quality in Web archiving
The VLDB Journal — The International Journal on Very Large Data Bases
Foundations and Trends in Information Retrieval
How is the Semantic Web evolving? A dynamic social network perspective
Computers in Human Behavior
Incremental graph pattern matching
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
An analysis of time-instability in web search results
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Temporal index sharding for space-time efficiency in archive search
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Timestamp-based result cache invalidation for web search engines
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Why is web search so hard... to evaluate?
Journal of Web Engineering
A survey on web archiving initiatives
TPDL'11 Proceedings of the 15th international conference on Theory and practice of digital libraries: research and advanced technology for digital libraries
Optimised local caching in cellular mobile networks
Computer Networks: The International Journal of Computer and Telecommunications Networking
Discovering URLs through user feedback
Proceedings of the 20th ACM international conference on Information and knowledge management
Scalable manipulation of archival web graphs
Proceedings of the 9th workshop on Large-scale and distributed informational retrieval
An empirical study on the change of web pages
APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
China web graph measurements and evolution
APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
A precise metric for measuring how much web pages change
DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
State transfer graph: an efficient tool for webview maintenance
WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management
Design and selection criteria for a national web archive
ECDL'06 Proceedings of the 10th European conference on Research and Advanced Technology for Digital Libraries
Classifying web data in directory structures
APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
Effective criteria for web page changes
APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
Proceedings of the fifth ACM international conference on Web search and data mining
World Wide Web
Web directory construction using lexical chains
NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
Keeping keywords fresh: a BM25 variation for personalized keyword extraction
Proceedings of the 2nd Temporal Web Analytics Workshop
XCC: change control of XML documents
Computer Science - Research and Development
Query preserving graph compression
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Exploring temporal evidence in web information retrieval
FDIA'07 Proceedings of the 1st BCS IRSG conference on Future Directions in Information Access
Graph pattern matching revised for social network analysis
Proceedings of the 15th International Conference on Database Theory
Coevolution of network structure and content
Proceedings of the 3rd Annual ACM Web Science Conference
Facilitating real-time graph mining
Proceedings of the fourth international workshop on Cloud data management
Reachability in graph timelines
Proceedings of the 4th conference on Innovations in Theoretical Computer Science
Diachronic linked data: towards long-term preservation of structured interrelated information
Proceedings of the First International Workshop on Open Data
An evaluation of caching policies for memento timemaps
Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Reading the correct history?: modeling temporal intention in resource sharing
Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Extending sitemaps for ResourceSync
Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Search the past with the portuguese web archive
Proceedings of the 22nd international conference on World Wide Web companion
A survey of web archive search architectures
Proceedings of the 22nd international conference on World Wide Web companion
Archival HTTP redirection retrieval policies
Proceedings of the 22nd international conference on World Wide Web companion
Incremental graph pattern matching
ACM Transactions on Database Systems (TODS)
BNCOD'13 Proceedings of the 29th British National conference on Big Data
Hi-index | 0.00 |
We seek to gain improved insight into how Web search engines shouldcope with the evolving Web, in an attempt to provide users with themost up-to-date results possible. For this purpose we collectedweekly snapshots of some 150 Web sites over the course of one year,and measured the evolution of content and link structure. Our measurements focus on aspects of potential interest to search engine designers: the evolution of link structure over time, the rate ofcreation of new pages and new distinct content on the Web, and the rate of change of the content of existing pages under search-centric measures of degree of change.Our findings indicate a rapid turnover rate of Web pages, i.e.,high rates of birth and death, coupled with an even higher rate ofturnover in the hyperlinks that connect them. For pages that persistover time we found that, perhaps surprisingly, the degree of contentshift as measured using TF.IDF cosine distance does not appear to beconsistently correlated with the frequency of contentupdating. Despite this apparent non-correlation, the rate of content shift of a given page is likely to remain consistent over time. That is, pages that change a great deal in one week will likely change by a similarly large degree in the following week. Conversely, pages that experience little change will continue to experience little change. We conclude the paper with a discussion of the potential implications ofour results for the design of effective Web search engines.