Deriving link-context from HTML tag tree

Authors:
Gautam Pant
Affiliations:
The University of Iowa, Iowa City, IA
Venue:
DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Year:
2003

Citing 9
Cited 13

Automatic resource compilation by analyzing hyperlink structure and associated text

WWW7 Proceedings of the seventh international conference on World Wide Web 7
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
The shark-search algorithm. An application: tailored Web site mapping

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Focused crawling: a new approach to topic-specific Web resource discovery

WWW '99 Proceedings of the eighth international conference on World Wide Web
Topical locality in the Web

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Adaptive Retrieval Agents: Internalizing Local Contextand Scaling up to the Web

Machine Learning - Special issue on information retrieval
Intelligent crawling on the World Wide Web with arbitrary predicates

Proceedings of the 10th international conference on World Wide Web
Effective site finding using link anchor information

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Accelerated focused crawling through online relevance feedback

Proceedings of the 11th international conference on World Wide Web

Extracting Precise Link Context Using NLP Parsing Technique

WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Link Contexts in Classifier-Guided Topical Crawlers

IEEE Transactions on Knowledge and Data Engineering
Focused crawling guided by link context

AIA'06 Proceedings of the 24th IASTED international conference on Artificial intelligence and applications
Agreeing to disagree: search engines and their public interfaces

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Link-Contexts for Ranking

ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
A framework to derive web page context from hyperlink structure

International Journal of Information and Communication Technology
Extracting Related Words from Anchor Text Clusters by Focusing on the Page Designer's Intention

DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
Adaptive geospatially focused crawling

Proceedings of the 18th ACM conference on Information and knowledge management
Adaptive focused crawler based on tunneling and link analysis

ICACT'09 Proceedings of the 11th international conference on Advanced Communication Technology - Volume 3
Web driving: an image-based opportunistic web browser that visualizes a peripheral information space

WISE'06 Proceedings of the 7th international conference on Web Information Systems
Adaptive topical web crawling for domain-specific resource discovery guided by link-context

MICAI'06 Proceedings of the 5th Mexican international conference on Artificial Intelligence
Updating broken web links: An automatic recommendation system

Information Processing and Management: an International Journal
WebDriving: web browsing based on a driving metaphor for improved children's e-learning

DEXA'06 Proceedings of the 17th international conference on Database and Expert Systems Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

HTML anchors are often surrounded by text that seems to describe the destination page appropriately. The text surrounding a link or the link-context is used for a variety of tasks associated with Web information retrieval. These tasks can benefit by identifying regularities in the manner in which "good" contexts appear around links. In this paper, we describe a framework for conducting such a study. The framework serves as an evaluation platform for comparing various link-context derivation methods. We apply the framework to a sample of Web pages obtained from more than 10,000 different categories of the ODP. Our focus is on understanding the potential merits of using a Web page's tag tree structure, for deriving link-contexts. We find that good link-context can be associated with tag tree hierarchy. Our results show that climbing up the tag tree when the link-context provided by greater depths is too short can provide better performance than some of the traditional techniques.