Spotting Where to Read on Pages - Retrieval of Relevant Parts from Page Images
DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Passage-Based Document Retrieval as a Tool for Text Mining with User's Information Needs
DS '01 Proceedings of the 4th International Conference on Discovery Science
Extraction of the contents in the web texts by content-density distribution
International Journal of Knowledge Engineering and Soft Data Paradigms
Extraction of web texts using content-density distribution
AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Hi-index | 0.00 |
This paper presents a method of automated generation of hypertext links for electronic documents. The goal is to generate links from an arbitrary part of a document (a source of a link) to its relevant parts of target documents (destinations). To achieve this goal, we assume that words are often shared by parts of documents if these parts are relevant with each other.In order to extract parts densely including words of a source (keywords), we employ density distributions of keywords. This enables us to determine destinations simply by extracting parts whose density exceeds a threshold. Experiments on generating links from figures/tables to parts of documents, as well as from texts to parts of different documents show that our method with the optimal parameters yields recall of 60% and precision of 50%.