Non-parametric significance tests of retrieval performance comparisons
Journal of Information Science
Effective retrieval of structured documents
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Information Processing and Management: an International Journal - Special issue: history of information science
Statistical inference in retrieval effectiveness evaluation
Information Processing and Management: an International Journal
A hidden Markov model information retrieval system
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval as statistical translation
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A study of smoothing methods for language models applied to Ad Hoc information retrieval
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
The Importance of Prior Probabilities for Entry Page Search
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Probabilistic models of information retrieval based on measuring the divergence from randomness
ACM Transactions on Information Systems (TOIS)
Searching XML documents via XML fragments
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
XML retrieval: what to retrieve?
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Length normalization in XML retrieval
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
A comparison of document, sentence, and term event spaces
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
XML search: languages, INEX and scoring
ACM SIGMOD Record
Using Contextual Information to Improve Search in Email Archives
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Using topic shifts for focussed access to XML repositories
ECIR'07 Proceedings of the 29th European conference on IR research
INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
The effect of structured queries and selective indexing on XML retrieval
INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
The university of kaiserslautern at INEX 2005
INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
SIRIUS: a lightweight XML indexing and approximate search system at INEX 2005
INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
Class normalization in centroid-based text categorization
Information Sciences: an International Journal
Extending information unit across media streams for improving retrieval effectiveness
Data & Knowledge Engineering
Hi-index | 0.00 |
XML retrieval is a departure from standard document retrieval in which each individual XML element, ranging from italicized words or phrases to full blown articles, is a retrievable unit. The distribution of XML element lengths is unlike what we usually observe in standard document collections, prompting us to revisit the issue of document length normalization. We perform a comparative analysis of arbitrary elements versus relevant elements, and show the importance of element length as a parameter for XML retrieval. Within the language modeling framework, we investigate a range of techniques that deal with length either directly or indirectly. We observe a length-bias introduced by the amount of smoothing, and show the importance of extreme length bias for XML retrieval. We also show that simply removing shorter elements from the index (by introducing a cut-off value) does not create an appropriate element length normalization. Even after restricting the minimal size of XML elements occurring in the index, the importance of an extreme explicit length bias remains.