Building enriched web page representations using link paths

Authors:
Tim Weninger;ChengXiang Zhai;Jiawei Han
Affiliations:
University of Illinois Urbana-Champaign, Urbana, IL, USA;University of Illinois Urbana-Champaign, Urbana, IL, USA;University of Illinois Urbana-Champaign, Urbana, IL, USA
Venue:
Proceedings of the 23rd ACM conference on Hypertext and social media
Year:
2012

Citing 22
Cited 2

Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Image retrieval by hypertext links

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic resource compilation by analyzing hyperlink structure and associated text

WWW7 Proceedings of the seventh international conference on World Wide Web 7
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Effective site finding using link anchor information

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Title language model for information retrieval

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Combining document representations for known-item search

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Analysis of anchor text for web search

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Anchor text mining for translation of Web queries: A transitive translation approach

ACM Transactions on Information Systems (TOIS)
Mining anchor text for query refinement

Proceedings of the 13th international conference on World Wide Web
A comparison of implicit and explicit links for web page classification

Proceedings of the 15th international conference on World Wide Web
Structured Data Extraction from the Web Based on Partial Tree Alignment

IEEE Transactions on Knowledge and Data Engineering
Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (Data-Centric Systems and Applications)

Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (Data-Centric Systems and Applications)
Modeling anchor text and classifying queries to enhance web document retrieval

Proceedings of the 17th international conference on World Wide Web
WebTables: exploring the power of tables on the web

Proceedings of the VLDB Endowment
Web-scale extraction of structured data

ACM SIGMOD Record
Extracting data records from the web using tag path clustering

Proceedings of the 18th international conference on World wide web
Building enriched document representations using aggregated anchor text

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Using anchor texts with their hyperlink structure for web search

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Entity relation discovery from web tables and links

Proceedings of the 19th international conference on World wide web
The importance of anchor text for ad hoc search revisited

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Growing parallel paths for entity-page discovery

Proceedings of the 20th international conference companion on World wide web

Exploring structure and content on the web: extraction and integration of the semi-structured web

Proceedings of the sixth ACM international conference on Web search and data mining
The parallel path framework for entity discovery on the web

ACM Transactions on the Web (TWEB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Anchor text has a history of enriching documents for a variety of tasks within the World Wide Web. Anchor texts are useful because they are similar to typical Web queries, and because they express the document's context. Therefore, it is a common practice for Web search engines to incorporate incoming anchor text into the document's standard textual representation. However, this approach will not suffice for documents with very few inlinks, and it does not incorporate the document's full context. To mediate these problems, we employ link paths, which contain anchor texts from paths through the Web ending at the document in question. We propose and study several different ways to aggregate anchor text from link paths, and we show that the information from link paths can be used to (1) improve known item search in site-specific search, and (2) map Web pages to database records. We rigorously evaluate our proposed approach on several real world test collections. We find that our approach significantly improves performance over baseline and existing techniques in both tasks.