Automatic generation of inter-passage links based on semantic similarity
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Hi-index | 0.00 |
In this paper, we describe methods taken by CSIR in the INEX 2008 Link-the-Wiki track. For the incoming link detection, we use p(d|t), the probability to generate a document, when given the topic file, to judge which documents are proper link sources for the given topic. For the file-to-file task of outgoing link detection, we take a two-step approach: first, we identify a group of candidate target documents by literally matching the topic file title and document content; then, candidate documents are ranked by the number of incoming links. For the anchor-to-BEP task, we use p(d|a,t), the probability to generate a document, when given the topic file and an anchor name, to select anchors and link targets for a given topic.