Improving author coreference by resource-bounded information gathering from the web

Authors:
Pallika Kanani;Andrew McCallum;Chris Pal
Affiliations:
Department of Computer Science, University of Massachusetts Amherst, Amherst, MA;Department of Computer Science, University of Massachusetts Amherst, Amherst, MA;Department of Computer Science, University of Massachusetts Amherst, Amherst, MA
Venue:
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Year:
2007

Citing 8
Cited 20

Resource-bounded reasoning in intelligent systems

ACM Computing Surveys (CSUR) - Special issue: position statements on strategic directions in computing research
Correlation Clustering

FOCS '02 Proceedings of the 43rd Symposium on Foundations of Computer Science
Toward Optimal Active Learning through Sampling Estimation of Error Reduction

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Active Learning for Natural Language Parsing and Information Extraction

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Web-scale information extraction in knowitall: (preliminary results)

Proceedings of the 13th international conference on World Wide Web
Name disambiguation in author citations using a K-way spectral clustering method

Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Learning and classifying under hard budgets

ECML'05 Proceedings of the 16th European conference on Machine Learning

Web based linkage

Proceedings of the 9th annual ACM international workshop on Web information and data management
Towards breaking the quality curse.: a web-querying approach to web people search.

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
A unified approach for schema matching, coreference and canonicalization

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
On co-authorship for author disambiguation

Information Processing and Management: an International Journal
Author name disambiguation in MEDLINE

ACM Transactions on Knowledge Discovery from Data (TKDD)
Resource-bounded information gathering for correlation clustering

COLT'07 Proceedings of the 20th annual conference on Learning theory
Effective self-training author name disambiguation in scholarly digital libraries

Proceedings of the 10th annual joint conference on Digital libraries
A heuristic approach to author name disambiguation in bibliometrics databases for large-scale research assessments

Journal of the American Society for Information Science and Technology
Exploiting Web querying for Web people search

ACM Transactions on Database Systems (TODS)
Resource-Bounded information extraction: acquiring missing feature values on demand

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Disambiguating authors in citations on the web and authorship correlations

Expert Systems with Applications: An International Journal
Cost-effective on-demand associative author name disambiguation

Information Processing and Management: an International Journal
A tool for generating synthetic authorship records for evaluating author name disambiguation methods

Information Sciences: an International Journal
Active associative sampling for author name disambiguation

Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Citation-based bootstrapping for large-scale author disambiguation

Journal of the American Society for Information Science and Technology
AUTOMATIC ANNOTATION OF AMBIGUOUS PERSONAL NAMES ON THE WEB

Computational Intelligence
A brief survey of automatic methods for author name disambiguation

ACM SIGMOD Record
Ambiguous author query detection using crowdsourced digital library annotations

Information Processing and Management: an International Journal
Bootstrapping active name disambiguation with crowdsourcing

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Query-driven approach to entity resolution

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Accurate entity resolution is sometimes impossible simply due to insufficient information. For example, in research paper author name resolution, even clever use of venue, title and coauthorship relations are often not enough to make a confident coreference decision. This paper presents several methods for increasing accuracy by gathering and integrating additional evidence from the web. We formulate the coreference problem as one of graph partitioning with discriminatively-trained edge weights, and then incorporate web information either as additional features or as additional nodes in the graph. Since the web is too large to incorporate all its data, we need an efficient procedure for selecting a subset of web queries and data. We formally describe the problem of resource bounded information gathering in each of these contexts, and show significant accuracy improvement with low cost.