Constructing a reliable Web graph with information on browsing behavior

Authors:
Yiqun Liu;Yufei Xue;Danqing Xu;Rongwei Cen;Min Zhang;Shaoping Ma;Liyun Ru
Affiliations:
-;-;-;-;-;-;-
Venue:
Decision Support Systems
Year:
2012

Citing 22
Cited 0

The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Does “authority” mean quality? predicting expert quality ratings of Web documents

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Graph structure in the Web

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Effective site finding using link anchor information

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Challenges in web search engines

ACM SIGIR Forum
Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search

IEEE Transactions on Knowledge and Data Engineering
Core algorithms in the CLEVER system

ACM Transactions on Internet Technology (TOIT)
The Web as a graph: How far we are

ACM Transactions on Internet Technology (TOIT)
An empirical study of web site navigation structures' impacts on web site usability

Decision Support Systems
Do visitors' interest level and perceived quantity of web page content matter in shaping the attitude toward a web site?

Decision Support Systems
DiffusionRank: a possible penicillin for web spamming

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Data cleansing for Web information retrieval using query independent features

Journal of the American Society for Information Science and Technology
Combating web spam with trustrank

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
A machine learning approach to web page filtering using content and structure analysis

Decision Support Systems
Mining the search trails of surfing crowds: identifying relevant websites from user activity

Proceedings of the 17th international conference on World Wide Web
BrowseRank: letting web users vote for page importance

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Ranking billions of web pages using diodes

Communications of the ACM - A Blind Person's Interaction with Technology
Mining web navigations for intelligence

Decision Support Systems - Special issue: Intelligence and security informatics
Incorporating web browsing activities into anchor texts for web search

Information Retrieval
Identifying Web Spam with the Wisdom of the Crowds

ACM Transactions on the Web (TWEB)
Predicting epidemic tendency through search behavior analysis

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three

Quantified Score

Hi-index	0.00

Visualization

Abstract

Page quality estimation is one of the greatest challenges for Web search engines. Hyperlink analysis algorithms such as PageRank and TrustRank are usually adopted for this task. However, low quality, unreliable and even spam data in the Web hyperlink graph makes it increasingly difficult to estimate page quality effectively. Analyzing large-scale user browsing behavior logs, we found that a more reliable Web graph can be constructed by incorporating browsing behavior information. The experimental results show that hyperlink graphs constructed with the proposed methods are much smaller in size than the original graph. In addition, algorithms based on the proposed ''surfing with prior knowledge'' model obtain better estimation results with these graphs for both high quality page and spam page identification tasks. Hyperlink graphs constructed with the proposed methods evaluate Web page quality more precisely and with less computational effort.