A language modeling approach to information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A vector space model for automatic indexing
Communications of the ACM
Relevance based language models
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Information-geometric measure for neural spikes
Neural Computation
Automatic word sense discrimination
Computational Linguistics - Special issue on word sense disambiguation
Dependence language model for information retrieval
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Large-Sample Learning of Bayesian Networks is NP-Hard
The Journal of Machine Learning Research
A Markov random field model for term dependencies
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
A multivariate nonparametric test of independence
Journal of Multivariate Analysis
A variable-length category-based n-gram language model
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Latent concept expansion using markov random fields
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Corpus-based and knowledge-based measures of text semantic similarity
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
A comparative study of methods for estimating query language models with pseudo feedback
Proceedings of the 18th ACM conference on Information and knowledge management
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Information geometry on hierarchy of probability distributions
IEEE Transactions on Information Theory
Object coding on the semantic graph for scene classification
Proceedings of the 21st ACM international conference on Multimedia
Hi-index | 0.00 |
The classical bag-of-word models fail to capture contextual associations between words. We propose to investigate the "high-order pure dependence" among a number of words forming a semantic entity, i.e., the high-order dependence that cannot be reduced to the random coincidence of lower-order dependence. We believe that identifying these high-order pure dependence patterns will lead to a better representation of documents. We first present two formal definitions of pure dependence: Unconditional Pure Dependence (UPD) and Conditional Pure Dependence (CPD). The decision on UPD or CPD, however, is a NP-hard problem. We hence prove a series of sufficient criteria that entail UPD and CPD, within the well-principled Information Geometry (IG) framework, leading to a more feasible UPD/CPD identification procedure. We further develop novel methods to extract word patterns with high-order pure dependence, which can then be used to extend the original unigram document models. Our methods are evaluated in the context of query expansion. Compared with the original unigram model and its extensions with term associations derived from constant n-grams and Apriori association rule mining, our IG-based methods have proved mathematically more rigorous and empirically more effective.