SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Copy detection mechanisms for digital documents
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Fixing the “broken-link” problem: the W3Objects approach
Proceedings of the fifth international World Wide Web conference on Computer networks and ISDN systems
A cooccurrence-based thesaurus and two applications to information retrieval
Information Processing and Management: an International Journal
Webvise: browser and proxy support for open hypermedia structuring mechanisms on the World Wide Web
WWW '99 Proceedings of the eighth international conference on World Wide Web
ACM Computing Surveys (CSUR)
Effective site finding using link anchor information
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Web page change and persistence---a four-year longitudinal study
Journal of the American Society for Information Science and Technology
Machine Learning Approach for Homepage Finding Task
SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
SimRank: a measure of structural-context similarity
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic Link Generation and Repair Mechanism for Document Management
HICSS '98 Proceedings of the Thirty-First Annual Hawaii International Conference on System Sciences - Volume 2
Analysis of anchor text for web search
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Deriving link-context from HTML tag tree
DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
The TREC robust retrieval track
ACM SIGIR Forum
Detecting nepotistic links by language model disagreement
Proceedings of the 15th international conference on World Wide Web
PageSim: a novel link-based measure of web page aimilarity
Proceedings of the 15th international conference on World Wide Web
ACM SIGIR Forum
Just-in-time recovery of missing web pages
Proceedings of the seventeenth conference on Hypertext and hypermedia
A reference collection for web spam
ACM SIGIR Forum
Optimizing web search using social annotations
Proceedings of the 16th international conference on World Wide Web
Can social bookmarking enhance search in the web?
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Using the web infrastructure to preserve web pages
International Journal on Digital Libraries
Analyzing Anchor-Links to Extract Semantic Inferences of a Web Page
ICIT '07 Proceedings of the 10th International Conference on Information Technology
Revisiting Lexical Signatures to (Re-)Discover Web Pages
ECDL '08 Proceedings of the 12th European conference on Research and Advanced Technology for Digital Libraries
Recommendation System for Automatic Recovery of Broken Web Links
IBERAMIA '08 Proceedings of the 11th Ibero-American conference on AI: Advances in Artificial Intelligence
Why are moved web pages difficult to find?: the WISH approach
Proceedings of the 18th international conference on World wide web
PageChaser: A Tool for the Automatic Correction of Broken Web Links
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Bringing your dead links back to life: a comprehensive approach and lessons learned
Proceedings of the 20th ACM conference on Hypertext and hypermedia
Retrieving broken web links using an approach based on contextual information
Proceedings of the 20th ACM conference on Hypertext and hypermedia
The SemEval-2007 WePS evaluation: establishing a benchmark for the web people search task
SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Proceedings of the 18th ACM conference on Information and knowledge management
DSNotify: handling broken links in the web of data
Proceedings of the 19th international conference on World wide web
Web search personalization via social bookmarking and tagging
ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
Analyzing information retrieval methods to recover broken web links
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
RESYGEN: A Recommendation System Generator using domain-based heuristics
Expert Systems with Applications: An International Journal
Bringing knowledge into recommender systems
Journal of Systems and Software
Learning resources in federated environments: a broken link checker based on URL similarity
International Journal of Metadata, Semantics and Ontologies
Editorial: A topic-specific crawling strategy based on semantics similarity
Data & Knowledge Engineering
Hi-index | 0.00 |
Broken hypertext links are a frequent problem in the Web. Sometimes the page which a link points to has disappeared forever, but in many other cases the page has simply been moved to another location in the same web site or to another one. In some cases the page besides being moved, is updated, becoming a bit different to the original one but rather similar. In all these cases it can be very useful to have a tool that provides us with pages highly related to the broken link, since we could select the most appropriate one. The relationship between the broken link and its possible linkable pages, can be defined as a function of many factors. In this work we have employed several resources both in the context of the link and in the Web to look for pages related to a broken link. From the resources in the context of a link, we have analyzed several sources of information such as the anchor text, the text surrounding the anchor, the URL and the page containing the link. We have also extracted information about a link from the Web infrastructure such as search engines, Internet archives and social tagging systems. We have combined all of these resources to design a system that recommends pages that can be used to recover the broken link. A novel methodology is presented to evaluate the system without resorting to user judgments, thus increasing the objectivity of the results, and helping to adjust the parameters of the algorithm. We have also compiled a web page collection with true broken links, which has been used to test the full system by humans. Results show that the system is able to recommend the correct page among the first ten results when the page has been moved, and to recommend highly related pages when the original one has disappeared.