AuGEAS: authoritativeness grading, estimation, and sorting

Authors:
Ayman Farahat;Geoff Nunberg;Francine Chen
Affiliations:
Palo Alto Research Center, Palo Alto, CA;Palo Alto Research Center, Palo Alto, CA;Palo Alto Research Center, Palo Alto, CA
Venue:
Proceedings of the eleventh international conference on Information and knowledge management
Year:
2002

Citing 9
Cited 2

C4.5: programs for machine learning

C4.5: programs for machine learning
Improved algorithms for topic distillation in a hyperlinked environment

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Authoritative sources in a hyperlinked environment

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Web Search---Your Way

Communications of the ACM
Modern Information Retrieval

Modern Information Retrieval
Measuring Search Engine Quality

Information Retrieval
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
Cost-sensitive classification: empirical evaluation of a hybrid genetic decision tree induction algorithm

Journal of Artificial Intelligence Research

Automatic topics discovery from hyperlinked documents

Information Processing and Management: an International Journal
Identifying topical authorities in microblogs

Proceedings of the fourth ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

When searching for content in in a large heterogeneous document collections like the World Wide Web it is not easy to know which documents provide reliable authoritative information about a subject. The problem is particularly pointed as it concerns content search for "high-value" informational needs such as retrieving medical information, where the cost of error may be high. In this paper, a method is described for estimating the authoritativeness of a document based on textual, non-topical cues. This method is complementary to estimates of authoritativeness based on link structure, such as the PageRank and HITS algorithms. This method is particularly suited to "high-value" content search where the user is interested in searching for information about a specific topic. A method for combining textual estimates of authoritativeness with link analysis is also presented. The types of textual cues to authoritativeness that are easily computed and utilized by our method are described, as well as the method used to select a subset of cues to increase the computation speed. Methods for applying authoritativeness estimates to re-ranking documents returned from search engines, combining textual authoritativeness with social authority, and use in query expansion are also presented. By combining textual authority with link analysis, a more complete and robust estimate can be made of a document's authoritativeness.