Extracting Related Words from Anchor Text Clusters by Focusing on the Page Designer's Intention

Authors:
Jianquan Liu;Hanxiong Chen;Kazutaka Furuse;Nobuo Ohbo
Affiliations:
Department of Computer Science, Graduate School of Systems and Information Engineering, University of Tsukuba, Ibaraki-ken, Japan 305-8577;Department of Computer Science, Graduate School of Systems and Information Engineering, University of Tsukuba, Ibaraki-ken, Japan 305-8577;Department of Computer Science, Graduate School of Systems and Information Engineering, University of Tsukuba, Ibaraki-ken, Japan 305-8577;Department of Computer Science, Graduate School of Systems and Information Engineering, University of Tsukuba, Ibaraki-ken, Japan 305-8577
Venue:
DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
Year:
2009

Citing 13
Cited 0

The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Breadth-first crawling yields high-quality pages

Proceedings of the 10th international conference on World Wide Web
Latent Semantic Kernels

Journal of Intelligent Information Systems
Mining the Web: Discovering Knowledge from HyperText Data

Mining the Web: Discovering Knowledge from HyperText Data
Deriving link-context from HTML tag tree

DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Mining anchor text for query refinement

Proceedings of the 13th international conference on World Wide Web
Scoring missing terms in information retrieval tasks

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Automatic collection of related terms from the web

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 2
A web-based kernel function for measuring the similarity of short text snippets

Proceedings of the 15th international conference on World Wide Web
Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (Data-Centric Systems and Applications)

Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (Data-Centric Systems and Applications)
Robust web page segmentation for mobile terminal using content-distances and page layout information

Proceedings of the 16th international conference on World Wide Web
Measuring semantic similarity between words using web search engines

Proceedings of the 16th international conference on World Wide Web
Extraction of anchor-related text and its evaluation by user studies

Proceedings of the 2007 conference on Human interface: Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Approaches for extracting related words (terms) by co-occurrence work poorly sometimes. Two words frequently co-occurring in the same documents are considered related. However, they may not relate at all because they would have no common meanings nor similar semantics. We address this problem by considering the page designer's intention and propose a new model to extract related words. Our approach is based on the idea that the web page designers usually make the correlative hyperlinks appear in close zone on the browser. We developed a browser-based crawler to collect "geographically" near hyperlinks, then by clustering these hyperlinks based on their pixel coordinates, we extract related words which can well reflect the designer's intention. Experimental results show that our method can represent the intention of the web page designer in extremely high precision. Moreover, the experiments indicate that our extracting method can obtain related words in a high average precision.