Resource-Bounded information extraction: acquiring missing feature values on demand

Authors:
Pallika Kanani;Andrew McCallum;Shaohan Hu
Affiliations:
University of Massachusetts, Amherst;University of Massachusetts, Amherst;Dartmouth College
Venue:
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Year:
2010

Citing 11
Cited 1

Web-scale information extraction in knowitall: (preliminary results)

Proceedings of the 13th international conference on World Wide Web
Message Understanding Conference-6: a brief history

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
An Expected Utility Approach to Active Feature-Value Acquisition

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Feature value acquisition in testing: a sequential batch test algorithm

ICML '06 Proceedings of the 23rd international conference on Machine learning
Corroborate and learn facts from the web

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Get another label? improving data quality and data mining using multiple, noisy labelers

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Information extraction from Wikipedia: moving down the long tail

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
VOILA: efficient feature-value acquisition for classification

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Improving author coreference by resource-bounded information gathering from the web

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Resource-bounded information gathering for correlation clustering

COLT'07 Proceedings of the 20th annual conference on Learning theory
Budgeted learning of nailve-bayes classifiers

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence

Goal-oriented sensor selection for intelligent phones: (GOSSIP)

Proceedings of the 2011 international workshop on Situation activity & goal awareness

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a general framework for the task of extracting specific information “on demand” from a large corpus such as the Web under resource-constraints. Given a database with missing or uncertain information, the proposed system automatically formulates queries, issues them to a search interface, selects a subset of the documents, extracts the required information from them, and fills the missing values in the original database. We also exploit inherent dependency within the data to obtain useful information with fewer computational resources. We build such a system in the citation database domain that extracts the missing publication years using limited resources from the Web. We discuss a probabilistic approach for this task and present first results. The main contribution of this paper is to propose a general, comprehensive architecture for designing a system adaptable to different domains.