Adaptation in natural and artificial systems
Adaptation in natural and artificial systems
Pivoted document length normalization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Finding Themes in Medline Documents: Probabilistic Similarity Search
ADL '00 Proceedings of the IEEE Advances in Digital Libraries 2000
A study of smoothing methods for language models applied to information retrieval
ACM Transactions on Information Systems (TOIS)
Where to start reading a textual XML document?
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
The interactive track at INEX 2004
INEX'04 Proceedings of the Third international conference on Initiative for the Evaluation of XML Retrieval
Hi-index | 0.00 |
Ad hocpassage retrieval within the Wikipedia is examined in the context of INEX 2007. An analysis of the INEX 2006 assessments suggests that fixed sized window of about 300 terms is consistently seen and that this might be a good retrieval strategy. In runs submitted to INEX, potentially relevant documents were identified using BM25 (trained on INEX 2006 data). For each potentially relevant document the location of every search term was identified and the center (mean) located. A fixed sized window was then centered on this location. A method of removing outliers was examined in which all terms occurring outside one standard deviation of the center were considered outliers and the center recomputed without them. Both techniques were examined with and without stemming.For Wikipedia linking we identified terms within the document that were over-represented and from the top few generated queries of different lengths. A BM25 ranking search engine was used to identify potentially relevant documents. Links from the source document to the potentially relevant documents (and back) were constructed (at a granularity of whole document). The best performing run used the 4 most over-represented search terms to retrieve 200 documents, and the next 4 to retrieve 50 more.