Automatic resource compilation by analyzing hyperlink structure and associated text
WWW7 Proceedings of the seventh international conference on World Wide Web 7
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Effective site finding using link anchor information
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Accelerated focused crawling through online relevance feedback
Proceedings of the 11th international conference on World Wide Web
Using web structure for classifying and describing web pages
Proceedings of the 11th international conference on World Wide Web
Deriving link-context from HTML tag tree
DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Lexical and semantic clustering by web links
Journal of the American Society for Information Science and Technology - Special issue: Webometrics
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Link Contexts in Classifier-Guided Topical Crawlers
IEEE Transactions on Knowledge and Data Engineering
Focused crawling guided by link context
AIA'06 Proceedings of the 24th IASTED international conference on Artificial intelligence and applications
Analyzing Anchor-Links to Extract Semantic Inferences of a Web Page
ICIT '07 Proceedings of the 10th International Conference on Information Technology
Hi-index | 0.00 |
Since an anchor is used in an HTML document to point to a related document/picture/media application, anchor-text becomes a potential resource to extract the information about an associated web page. However, sometimes anchor-texts are either not present at all or a single word text/an image anchor is contained in the anchor tag. In these situations, the text surrounding a link or the link-context assumes importance in the sense that it can be used to derive the context of the target web page. In this paper, a dataset of about 100 web pages of different categories from Open Directory Project (ODP) has been surveyed and analysed. The results show that cohesive text surrounding the anchor in the form of full sentences and non-cohesive text present elsewhere in the in-link web pages provides rich semantic information about a target web page, which in turn can be considered as the context of the target web page. Since, generally, there are several in-links for a target web page, a filtering mechanism, based on the linguistic analysis of all context-sentences, which filters the best described context sentence, has been developed and is being described and evaluated in this paper.