Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Image retrieval by hypertext links
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic resource compilation by analyzing hyperlink structure and associated text
WWW7 Proceedings of the seventh international conference on World Wide Web 7
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Effective site finding using link anchor information
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Title language model for information retrieval
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Combining document representations for known-item search
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Analysis of anchor text for web search
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Anchor text mining for translation of Web queries: A transitive translation approach
ACM Transactions on Information Systems (TOIS)
Mining anchor text for query refinement
Proceedings of the 13th international conference on World Wide Web
A comparison of implicit and explicit links for web page classification
Proceedings of the 15th international conference on World Wide Web
Structured Data Extraction from the Web Based on Partial Tree Alignment
IEEE Transactions on Knowledge and Data Engineering
Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (Data-Centric Systems and Applications)
Modeling anchor text and classifying queries to enhance web document retrieval
Proceedings of the 17th international conference on World Wide Web
WebTables: exploring the power of tables on the web
Proceedings of the VLDB Endowment
Web-scale extraction of structured data
ACM SIGMOD Record
Extracting data records from the web using tag path clustering
Proceedings of the 18th international conference on World wide web
Building enriched document representations using aggregated anchor text
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Using anchor texts with their hyperlink structure for web search
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Entity relation discovery from web tables and links
Proceedings of the 19th international conference on World wide web
The importance of anchor text for ad hoc search revisited
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Growing parallel paths for entity-page discovery
Proceedings of the 20th international conference companion on World wide web
Exploring structure and content on the web: extraction and integration of the semi-structured web
Proceedings of the sixth ACM international conference on Web search and data mining
The parallel path framework for entity discovery on the web
ACM Transactions on the Web (TWEB)
Hi-index | 0.00 |
Anchor text has a history of enriching documents for a variety of tasks within the World Wide Web. Anchor texts are useful because they are similar to typical Web queries, and because they express the document's context. Therefore, it is a common practice for Web search engines to incorporate incoming anchor text into the document's standard textual representation. However, this approach will not suffice for documents with very few inlinks, and it does not incorporate the document's full context. To mediate these problems, we employ link paths, which contain anchor texts from paths through the Web ending at the document in question. We propose and study several different ways to aggregate anchor text from link paths, and we show that the information from link paths can be used to (1) improve known item search in site-specific search, and (2) map Web pages to database records. We rigorously evaluate our proposed approach on several real world test collections. We find that our approach significantly improves performance over baseline and existing techniques in both tasks.