Referential integrity of links in open hypermedia systems
Proceedings of the ninth ACM conference on Hypertext and hypermedia : links, objects, time and space---structure in hypermedia systems: links, objects, time and space---structure in hypermedia systems
Missing the 404: link integrity on the World Wide Web
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Finding related pages in the World Wide Web
WWW '99 Proceedings of the eighth international conference on World Wide Web
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
ACM Computing Surveys (CSUR)
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
A vector space model for automatic indexing
Communications of the ACM
Electronic document addressing: dealing with change
ACM Computing Surveys (CSUR)
Web page change and persistence---a four-year longitudinal study
Journal of the American Society for Information Science and Technology
The decay and failures of web references
Communications of the ACM
WWW '03 Proceedings of the 12th international conference on World Wide Web
Robust Hyperlinks Cost Just Five Words Each
Robust Hyperlinks Cost Just Five Words Each
Refinement of TF-IDF schemes for web pages using their hyperlinked neighboring pages
Proceedings of the fourteenth ACM conference on Hypertext and hypermedia
Using the web to obtain frequencies for unseen bigrams
Computational Linguistics - Special issue on web as corpus
Analysis of lexical signatures for improving information persistence on the World Wide Web
ACM Transactions on Information Systems (TOIS)
What's there and what's not?: focused crawling for missing documents in digital libraries
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
A browser for browsing the past web
Proceedings of the 15th international conference on World Wide Web
Just-in-time recovery of missing web pages
Proceedings of the seventeenth conference on Hypertext and hypermedia
Lazy preservation: reconstructing websites by crawling the crawlers
WIDM '06 Proceedings of the 8th annual ACM international workshop on Web information and data management
Using the web infrastructure to preserve web pages
International Journal on Digital Libraries
Genealogical trees on the web: a search engine user perspective
Proceedings of the 17th international conference on World Wide Web
Lazy preservation: reconstructing websites from the web infrastructure
Lazy preservation: reconstructing websites from the web infrastructure
Can all tags be used for search?
Proceedings of the 17th ACM conference on Information and knowledge management
A comparison of techniques for estimating IDF values to generate lexical signatures for the web
Proceedings of the 10th ACM workshop on Web information and data management
Finding what is missing from a digital library: A case study in the Computer Science field
Information Processing and Management: an International Journal
Inter-search engine lexical signature performance
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
WordRank-Based lexical signatures for finding lost or related web pages
APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
Proceedings of the 21st ACM conference on Hypertext and hypermedia
Synchronicity: automatically rediscover missing web pages in real time
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
DSNotify - A solution for event detection and link maintenance in dynamic datasets
Web Semantics: Science, Services and Agents on the World Wide Web
Find, new, copy, web, page - tagging for the (re-)discovery of web pages
TPDL'11 Proceedings of the 15th international conference on Theory and practice of digital libraries: research and advanced technology for digital libraries
TPDL'12 Proceedings of the Second international conference on Theory and Practice of Digital Libraries
Semi-automated rediscovery of lost YouTube music videos
Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Hi-index | 0.00 |
Missing web pages (pages that return the 404 "Page Not Found error) are part of the browsing experience. The manual use of search engines to rediscover missing pages can be frustrating and unsuccessful. We compare four automated methods for rediscovering web pages. We extract the page's title, generate the page's lexical signature (LS), obtain the page's tags from the bookmarking website delicious.com and generate a LS from the page's link neighborhood. We use the output of all methods to query Internet search engines and analyze their retrieval performance. Our results show that both LSs and titles perform fairly well with over 60% URIs returned top ranked from Yahoo!. However, the combination of methods improves the retrieval performance. Considering the complexity of the LS generation, querying the title first and in case of insufficient results querying the LSs second is the preferable setup. This combination accounts for more than 75% top ranked URIs.