WWW '03 Proceedings of the 12th international conference on World Wide Web
Robust Hyperlinks Cost Just Five Words Each
Robust Hyperlinks Cost Just Five Words Each
Depth- and breadth-first processing of search result lists
CHI '04 Extended Abstracts on Human Factors in Computing Systems
Analysis of lexical signatures for improving information persistence on the World Wide Web
ACM Transactions on Information Systems (TOIS)
A browser for browsing the past web
Proceedings of the 15th international conference on World Wide Web
Just-in-time recovery of missing web pages
Proceedings of the seventeenth conference on Hypertext and hypermedia
Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search
ACM Transactions on Information Systems (TOIS)
Agreeing to disagree: search engines and their public interfaces
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Using the web infrastructure to preserve web pages
International Journal on Digital Libraries
SS'07 Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium
WordRank-Based lexical signatures for finding lost or related web pages
APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
A comparison of techniques for estimating IDF values to generate lexical signatures for the web
Proceedings of the 10th ACM workshop on Web information and data management
Correlation of Term Count and Document Frequency for Google N-Grams
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Rediscovering missing web pages using link neighborhood lexical signatures
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Find, new, copy, web, page - tagging for the (re-)discovery of web pages
TPDL'11 Proceedings of the 15th international conference on Theory and practice of digital libraries: research and advanced technology for digital libraries
Analyzing information retrieval methods to recover broken web links
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Updating broken web links: An automatic recommendation system
Information Processing and Management: an International Journal
Reading the correct history?: modeling temporal intention in resource sharing
Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Determining the titles of Web pages using anchor text and link analysis
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
A lexical signature (LS) is a small set of terms derived from a document that capture the "aboutness" of that document. A LS generated from a web page can be used to discover that page at a different URL as well as to find relevant pages in the Internet. From a set of randomly selected URLs we took all their copies from the Internet Archive between 1996 and 2007 and generated their LSs. We conducted an overlap analysis of terms in all LSs and found only small overlaps in the early years (1996 茂戮驴 2000) but increasing numbers in the more recent past (from 2003 on). We measured the performance of all LSs in dependence of the number of terms they consist of. We found that LSs created more recently perform better than early LSs created between 1996 and 2000. All LSs created from year 2000 on show a similar pattern in their performance curve. Our results show that 5-, 6- and 7-term LSs perform best with returning the URLs of interest in the top ten of the result set. In about 50% of all cases these URLs are returned as the number one result and in 30% of all times we considered the URLs as not discoved.