Modeling the web as a hypergraph to compute page reputation

Authors:
Klessius Berlt;Edleno Silva de Moura;André Carvalho;Marco Cristo;Nivio Ziviani;Thierson Couto
Affiliations:
Department of Computer Science, Federal University of Amazonas, Manaus, Brazil;Department of Computer Science, Federal University of Amazonas, Manaus, Brazil;Department of Computer Science, Federal University of Amazonas, Manaus, Brazil;FUCAPI, Analysis, Research and Tech. Innovation Center, Manaus, Brazil;Department of Computer Science, Federal University of Minas Gerais, Belo Horizonte, Brazil;Institute of Informatics, Federal University of Goiás, Goiínia, Brazil
Venue:
Information Systems
Year:
2010

Citing 20
Cited 7

Measuring the Web

Proceedings of the fifth international World Wide Web conference on Computer networks and ISDN systems
Improved algorithms for topic distillation in a hyperlinked environment

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Authoritative sources in a hyperlinked environment

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Does “authority” mean quality? predicting expert quality ratings of Web documents

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
SALSA: the stochastic approach for link-structure analysis

ACM Transactions on Information Systems (TOIS)
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Local versus global link information in the Web

ACM Transactions on Information Systems (TOIS)
Who Links to Whom: Mining Linkage between Web Sites

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Engineering a multi-purpose test collection for web retrieval experiments

Information Processing and Management: an International Journal
Replicating Web Structure in Small-Scale Test Collections

Information Retrieval
Link analysis ranking: algorithms, theory, and experiments

ACM Transactions on Internet Technology (TOIT)
Identifying link farm spam pages

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Information retrieval system evaluation: effort, sensitivity, and reliability

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Exploiting the hierarchical structure for link analysis

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance weighting for query independent evidence

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Link spam alliances

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Site level noise removal for search engines

Proceedings of the 15th international conference on World Wide Web
A reference collection for web spam

ACM SIGIR Forum
Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters

IEEE Transactions on Computers

Recommendation of similar users, resources and social networks in a Social Internetworking Scenario

Information Sciences: an International Journal
A new approach for verifying URL uniqueness in web crawlers

SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
An evolutionary factor analysis computation for mining website structures

Expert Systems with Applications: An International Journal
An adaptive learning automata-based ranking function discovery algorithm

Journal of Intelligent Information Systems
An adaptive learning to rank algorithm: Learning automata approach

Decision Support Systems
Using site-level connections to estimate link confidence

Journal of the American Society for Information Science and Technology
How do metrics of link analysis correlate to quality, relevance and popularity in wikipedia?

Proceedings of the 19th Brazilian symposium on Multimedia and the web

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this work we propose a model to represent the web as a directed hypergraph (instead of a graph), where links connect pairs of disjointed sets of pages. The web hypergraph is derived from the web graph by dividing the set of pages into non-overlapping blocks and using the links between pages of distinct blocks to create hyperarcs. A hyperarc connects a block of pages to a single page, in order to provide more reliable information for link analysis. We use the hypergraph model to create the hypergraph versions of the Pagerank and Indegree algorithms, referred to as HyperPagerank and HyperIndegree, respectively. The hypergraph is derived from the web graph by grouping pages by two different partition criteria: grouping together the pages that belong to the same web host or to the same web domain. We compared the original page-based algorithms with the host-based and domain-based versions of the algorithms, considering a combination of the page reputation, the textual content of the pages and the anchor text. Experimental results using three distinct web collections show that the HyperPagerank and HyperIndegree algorithms may yield better results than the original graph versions of the Pagerank and Indegree algorithms. We also show that the hypergraph versions of the algorithms were slightly less affected by noise links and spamming.