Inferring Web communities from link topology
Proceedings of the ninth ACM conference on Hypertext and hypermedia : links, objects, time and space---structure in hypermedia systems: links, objects, time and space---structure in hypermedia systems
Trawling the Web for emerging cyber-communities
WWW '99 Proceedings of the eighth international conference on World Wide Web
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Extracting Patterns and Relations from the World Wide Web
WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Snowball: extracting relations from large plain-text collections
DL '00 Proceedings of the fifth ACM conference on Digital libraries
Mining, ranking, and using acronym patterns
APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
Tuple refinement method based on relationship keyword extension
WISM'12 Proceedings of the 2012 international conference on Web Information Systems and Mining
Hi-index | 0.02 |
The Web is a rich source of information, but this information is scattered and hidden in the diversity of web pages. Search engines are windows to the web. However, the current search engines, designed to identify pages with specified phrases have very limited power. For example, they cannot search for phrases related in a particular way (e.g. books and their authors).In this paper we present a solution for identifying a set of inter-related information on the web using the duality concept. Duality problems arise when one tries to identify a pair of inter-related phrases such as (book, author), (name, email) or (acronym, expansion) relations. We propose a solution to this problem that iteratively refines mutually dependent approximations to their identifications. Specifically, we iteratively refine i) pairs of phrases related in a specific way, and ii) the patterns of their occurrences in web pages, i.e. the ways in which the related phrases are marked in the pages. We cast light on the general solution of the duality problems in the web by concentrating on one paradigmatic duality problem i.e. identifying (acronym, expansion) pairs in terms of the patterns of their occurrences in the web pages. The solution to this problem involves two mutually dependent duality problems of 1) the duality between the related pairs and their patterns, and 2) the duality between the related pairs and the acronym formulation rules.