Maintaining distributed hypertext infostructures: welcome to MOMspider's Web
Selected papers of the first conference on World-Wide Web
Fixing the “broken-link” problem: the W3Objects approach
Proceedings of the fifth international World Wide Web conference on Computer networks and ISDN systems
Referential integrity of links in open hypermedia systems
Proceedings of the ninth ACM conference on Hypertext and hypermedia : links, objects, time and space---structure in hypermedia systems: links, objects, time and space---structure in hypermedia systems
Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
Dynamic reference sifting: a case study in the homepage domain
Selected papers from the sixth international conference on World Wide Web
Missing the 404: link integrity on the World Wide Web
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Authoritative sources in a hyperlinked environment
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Finding replicated Web collections
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
ACM Computing Surveys (CSUR)
Squeal: a structured query language for the Web
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Electronic document addressing: dealing with change
ACM Computing Surveys (CSUR)
ACM Computing Surveys (CSUR)
Query Pairs as Hypertext Links
Proceedings of the Seventh International Conference on Data Engineering
Sic transit gloria telae: towards an understanding of the web's decay
Proceedings of the 13th international conference on World Wide Web
Analysis of lexical signatures for improving information persistence on the World Wide Web
ACM Transactions on Information Systems (TOIS)
Reference reconciliation in complex information spaces
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Do not crawl in the dust: different urls with similar text
Proceedings of the 16th international conference on World Wide Web
DSNotify: handling broken links in the web of data
Proceedings of the 19th international conference on World wide web
A more specific events classification to improve crawling techniques
OTM'10 Proceedings of the 2010 international conference on On the move to meaningful internet systems
Proceedings of the 1st International Workshop on Linked Web Data Management
DSNotify - A solution for event detection and link maintenance in dynamic datasets
Web Semantics: Science, Services and Agents on the World Wide Web
Updating broken web links: An automatic recommendation system
Information Processing and Management: an International Journal
Hi-index | 0.00 |
This paper presents an experimental study of the automatic correction of broken (dead) Web links focusing, in particular, on links broken by the relocation ofWeb pages. Our first contribution is that we developed an algorithm that incorporates a comprehensive set of heuristics, some of which are novel, in a single unified framework. The second contribution is that we conducted a relatively large-scale experiment, and analysis of our results revealed the characteristics of the problem of finding movedWeb pages. We demonstrated empirically that the problem of searching for moved pages is different from typical information retrieval problems. First, it is impossible to identify the final destination until the page is moved, so the index-server approach is not necessarily effective. Secondly, there is a large bias about where the new address is likely to be and crawler-based solutions can be effectively implemented, avoiding the need to search the entire Web. We analyzed the experimental results in detail to show how important each heuristic is in real Web settings, and conducted statistical analyses to show that our algorithm succeeds in correctly finding new links for more than 70% of broken links at 95% confidence level.