Maximum likelihood estimation for filtering thresholds

Authors:
Yi Zhang;Jamie Callan
Affiliations:
Carnegie Mellon Univ., Pittsburgh, PA;Carnegie Mellon Univ., Pittsburgh, PA
Venue:
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2001

Citing 9
Cited 41

Numerical recipes in C (2nd ed.): the art of scientific computing

Numerical recipes in C (2nd ed.): the art of scientific computing
OHSUMED: an interactive retrieval evaluation and new large test collection for research

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Document filtering with inference networks

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Incremental relevance feedback for information filtering

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Boosting and Rocchio applied to text filtering

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Learning while filtering documents

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Text filtering by boosting naive Bayes classifiers

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Boosting for document routing

Proceedings of the ninth international conference on Information and knowledge management
Modeling score distributions for combining the outputs of search engines

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval

Novelty and redundancy detection in adaptive filtering

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic query wefinement using lexical affinities with maximal information gain

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Information Filtering in TREC-9 and TDT-3: A Comparative Analysis

Information Retrieval
Threshold Setting and Performance Optimization in Adaptive Filtering

Information Retrieval
Comparison of Normalization Techniques for Metasearch

ADVIS '02 Proceedings of the Second International Conference on Advances in Information Systems
Margin-based local regression for adaptive filtering

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Using bayesian priors to combine classifiers for adaptive filtering

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions

IEEE Transactions on Knowledge and Data Engineering
Robustness of adaptive filtering methods in a cross-benchmark evaluation

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Adaptive sampling for thresholding in document filtering and classification

Information Processing and Management: an International Journal
Performance thresholding in practical text classification

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Answering bounded continuous search queries in the world wide web

Proceedings of the 16th international conference on World Wide Web
Dynamic category profiling for text filtering and classification

Information Processing and Management: an International Journal
A formal approach to score normalization for meta-search

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Towards a belief-revision-based adaptive and context-sensitive information retrieval system

ACM Transactions on Information Systems (TOIS)
Interactive high-quality text classification

Information Processing and Management: an International Journal
Information filtering and query indexing for an information retrieval model

ACM Transactions on Information Systems (TOIS)
Use of Radio Frequency Identification for Targeted Advertising: A Collaborative Filtering Approach Using Bayesian Networks

ECSQARU '07 Proceedings of the 9th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty
Where to stop reading a ranked list?: threshold optimization using truncated score distributions

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Improving text categorization bootstrapping via unsupervised learning

ACM Transactions on Speech and Language Processing (TSLP)
Score Distributions in Information Retrieval

ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Modeling the Score Distributions of Relevant and Non-relevant Documents

ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
A signal-to-noise approach to score normalization

Proceedings of the 18th ACM conference on Information and knowledge management
Text classification for healthcare information support

IEA/AIE'07 Proceedings of the 20th international conference on Industrial, engineering, and other applications of applied intelligent systems
Inferring document utility via a decision-making based retrieval model

International Journal of Knowledge-based and Intelligent Engineering Systems
Score distribution models: assumptions, intuition, and robustness to score manipulation

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Interaction-based information filtering for children

Proceedings of the third symposium on Information interaction in context
Selected new training documents to update user profile

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Modeling score distributions in information retrieval

Information Retrieval
Variational bayes for modeling score distributions

Information Retrieval
Automatic threshold estimation for data matching applications

Information Sciences: an International Journal
Score distribution approach to automatic kernel selection for image retrieval systems

ISMIS'06 Proceedings of the 16th international conference on Foundations of Intelligent Systems
Dynamic category profiling for text filtering and classification

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Federated search of text-based digital libraries in hierarchical peer-to-peer networks

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Extended expectation maximization for inferring score distributions

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Practical considerations when filtering documents

Proceedings of the 4th Information Interaction in Context Symposium
Adaptable Services for Novelty Mining

International Journal of Systems and Service-Oriented Engineering
Scoring-Thresholding pattern based text classifier

ACIIDS'13 Proceedings of the 5th Asian conference on Intelligent Information and Database Systems - Volume Part I
A weakly-supervised detection of entity central documents in a stream

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
A pattern based two-stage text classifier

MLDM'13 Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition
From clicking to consideration: A business intelligence approach to estimating consumers' consideration probabilities

Decision Support Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Information filtering systems based on statistical retrieval models usually compute a numeric score indicating how well each document matches each profile. Documents with scores above profile-specificdissemination thresholdsare delivered.An optimal dissemination threshold is one that maximizes a given utility function based on the distributions of the scores of relevant and non-relevant documents. The parameters of the distribution can be estimated using relevance information, but relevance information obtained while filtering isbiased. This paper presents a new method of adjusting dissemination thresholds that explicitly models and compensates for this bias. The new algorithm, which is based on the Maximum Likelihood principle, jointly estimates the parameters of the density distributions for relevant and non-relevant documents and the ratio of the relevant document in the corpus. Experiments with TREC-8 and TREC-9 Filtering Track data demonstrate the effectiveness of the algorithm.