Discovering missing values in semi-structured databases

Authors:
Xing Yi;James Allan;Victor Lavrenko
Affiliations:
Drive University of Massachusetts, Amherst, MA;Drive University of Massachusetts, Amherst, MA;Drive University of Massachusetts, Amherst, MA
Venue:
Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Year:
2007

Citing 15
Cited 1

A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
BoosTexter: A Boosting-based Systemfor Text Categorization

Machine Learning - Special issue on information retrieval
Relevance based language models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Linkage and Autocorrelation Cause Feature Selection Bias in Relational Learning

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Classification of HTML Documents by Hidden Tree-Markov Models

ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Simple Estimators for Relational Bayesian Classifiers

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Learning relational probability trees

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Bayesian network model for semi-structured document classification

Information Processing and Management: an International Journal - Special issue: Bayesian networks and information retrieval
Multi-labelled classification using maximum entropy method

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Kernel-Based Learning of Hierarchical Multilabel Classification Models

The Journal of Machine Learning Research
Learning probabilistic relational models

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Probabilistic classification and clustering in relational data

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
Discriminative probabilistic models for relational data

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence

Predicting social-tags for cold start book recommendations

Proceedings of the third ACM conference on Recommender systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We explore the problem of discovering multiple missing values in a semi-structured database. For this task, we formally develop Structured Relevance Model (SRM) built on one hypothetical generative model for semi-structured records. SRM is based on the idea that plausible values for a given field could be inferred from the context provided by the other fields in the record. Small-scale experiments on IMDb (Internet Movie Database) show that SRM matched three state-of-the-art relational learning approaches on the movie label prediction tasks. Large-scale experiments on a snapshot of the National Science Digital Library (NSDL) repository show that SRM is highly effective at discovering possible values for free-text fields even with quite modest amounts of training data, compared with state-of-the-art machine learning approaches.