Measuring similarity to detect qualified links

Authors:
Xiaoguang Qi;Lan Nie;Brian D. Davison
Affiliations:
Lehigh University;Lehigh University;Lehigh University
Venue:
AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Year:
2007

Citing 20
Cited 11

Improved algorithms for topic distillation in a hyperlinked environment

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Automatic resource compilation by analyzing hyperlink structure and associated text

WWW7 Proceedings of the seventh international conference on World Wide Web 7
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Making large-scale support vector machine learning practical

Advances in kernel methods
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
IR evaluation methods for retrieving highly relevant documents

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
The stochastic approach for link-structure analysis (SALSA) and the TKC effect

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Improvement of HITS-based algorithms on web documents

Proceedings of the 11th international conference on World Wide Web
Information Retrieval

Information Retrieval
Block-level link analysis

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Spam, damn spam, and statistics: using statistical analysis to locate spam web pages

Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Identifying link farm spam pages

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Fast webpage classification using URL features

Proceedings of the 14th ACM international conference on Information and knowledge management
The SMART Retrieval System—Experiments in Automatic Document Processing

The SMART Retrieval System—Experiments in Automatic Document Processing
Site level noise removal for search engines

Proceedings of the 15th international conference on World Wide Web
Detecting spam web pages through content analysis

Proceedings of the 15th international conference on World Wide Web
Detecting nepotistic links by language model disagreement

Proceedings of the 15th international conference on World Wide Web
Stanford WebBase components and applications

ACM Transactions on Internet Technology (TOIT)
Combating web spam with trustrank

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Thwarting the nigritude ultramarine: learning to identify link spam

ECML'05 Proceedings of the 16th European conference on Machine Learning

Adversarial Information Retrieval on the Web (AIRWeb 2007)

ACM SIGIR Forum
A study of link farm distribution and evolution using a time series of web snapshots

Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
Web spam identification through language model analysis

Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
Use noisy link analysis to improve web search

Proceedings of the 20th ACM conference on Hypertext and hypermedia
Identifying spam link generators for monitoring emerging web spam

Proceedings of the 4th workshop on Information credibility
Web spam detection: new classification features based on qualified link analysis and language models

IEEE Transactions on Information Forensics and Security
Querying the web graph

SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Combating link spam by noisy link analysis

ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications: Part I
Adversarial Web Search

Foundations and Trends in Information Retrieval
Bridging link and query intent to enhance web search

Proceedings of the 22nd ACM conference on Hypertext and hypermedia
Learning resources in federated environments: a broken link checker based on URL similarity

International Journal of Metadata, Semantics and Ontologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

The early success of link-based ranking algorithms was predicated on the assumption that links imply merit of the target pages. However, today many links exist for purposes other than to confer authority. Such links bring noise into link analysis and harm the quality of retrieval. In order to provide high quality search results, it is important to detect them and reduce their influence. In this paper, a method is proposed to detect such links by considering multiple similarity measures over the source pages and target pages. With the help of a classifier, these noisy links are detected and dropped. After that, link analysis algorithms are performed on the reduced link graph. The usefulness of a number of features are also tested. Experiments across 53 query-specific datasets show our approach almost doubles the performance of Kleinberg's HITS and boosts Bharat and Henzinger's imp algorithm by close to 9% in terms of precision. It also outperforms a previous approach focusing on link farm detection.