The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Breadth-first crawling yields high-quality pages
Proceedings of the 10th international conference on World Wide Web
Journal of Intelligent Information Systems
Mining the Web: Discovering Knowledge from HyperText Data
Mining the Web: Discovering Knowledge from HyperText Data
Deriving link-context from HTML tag tree
DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Mining anchor text for query refinement
Proceedings of the 13th international conference on World Wide Web
Scoring missing terms in information retrieval tasks
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Automatic collection of related terms from the web
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 2
A web-based kernel function for measuring the similarity of short text snippets
Proceedings of the 15th international conference on World Wide Web
Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (Data-Centric Systems and Applications)
Robust web page segmentation for mobile terminal using content-distances and page layout information
Proceedings of the 16th international conference on World Wide Web
Measuring semantic similarity between words using web search engines
Proceedings of the 16th international conference on World Wide Web
Extraction of anchor-related text and its evaluation by user studies
Proceedings of the 2007 conference on Human interface: Part I
Hi-index | 0.00 |
Approaches for extracting related words (terms) by co-occurrence work poorly sometimes. Two words frequently co-occurring in the same documents are considered related. However, they may not relate at all because they would have no common meanings nor similar semantics. We address this problem by considering the page designer's intention and propose a new model to extract related words. Our approach is based on the idea that the web page designers usually make the correlative hyperlinks appear in close zone on the browser. We developed a browser-based crawler to collect "geographically" near hyperlinks, then by clustering these hyperlinks based on their pixel coordinates, we extract related words which can well reflect the designer's intention. Experimental results show that our method can represent the intention of the web page designer in extremely high precision. Moreover, the experiments indicate that our extracting method can obtain related words in a high average precision.