Iterative Set Expansion of Named Entities Using the Web

Authors:
Richard C. Wang;William W. Cohen
Affiliations:
-;-
Venue:
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Year:
2008

Citing 0
Cited 22

Automatic set instance extraction using the web

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
ONTOMO: Development of Ontology Building Service

PRIMA '09 Proceedings of the 12th International Conference on Principles of Practice in Multi-Agent Systems
Web-scale distributional similarity and entity set expansion

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Character-level analysis of semi-structured documents for set expansion

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Coupled semi-supervised learning for information extraction

Proceedings of the third ACM international conference on Web search and data mining
Learning 5000 relational extractors

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Distributional similarity vs. PU learning for entity set expansion

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Corpus-based semantic class mining: distributional vs. pattern-based approaches

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
The role of queries in ranking labeled instances extracted from text

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
SEISA: set expansion by iterative similarity aggregation

Proceedings of the 20th international conference on World wide web
Entity set expansion in opinion documents

Proceedings of the 22nd ACM conference on Hypertext and hypermedia
ITEM: extract and integrate entities from tabular data to RDF knowledge base

APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
Recovering semantics of tables on the web

Proceedings of the VLDB Endowment
Automatically building training examples for entity extraction

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
User Behaviors in Related Word Retrieval and New Word Detection: A Collaborative Perspective

ACM Transactions on Asian Language Information Processing (TALIP)
Finding dimensions for queries

Proceedings of the 20th ACM international conference on Information and knowledge management
An analysis of structured data on the web

Proceedings of the VLDB Endowment
Finding related tables

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Learning to find comparable entities on the web

WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Fusing distributional and experiential information for measuring semantic relatedness

Information Fusion
Autonomously reviewing and validating the knowledge base of a never-ending learning system

Proceedings of the 22nd international conference on World Wide Web companion
Methods for exploring and mining tables on Wikipedia

Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Set expansion refers to expanding a partial set of "seed" objects into a more complete set. One system that does set expansion is SEAL (Set Expander for Any Language), which expands entities automatically by utilizing resources from the Web in a language independent fashion. In a previous study, SEAL showed good set expansion performance using three seed entities; however, when given a larger set of seeds (e.g., ten), SEAL's expansion method performs poorly. In this paper, we present Iterative SEAL (iSEAL), which allows a user to provide many seeds. Briefly, iSEAL makes several calls to SEAL, each call using a small number of seeds. We also show that iSEAL can be used in a "bootstrapping" manner, where each call to SEAL uses a mixture of user-provided and self-generated seeds. We show that the bootstrapping version of iSEAL obtains better results than SEAL even when using fewer user-provided seeds. In addition, we compare the performance of various ranking algorithms used in iSEAL, and show that the choice of ranking method has a small effect on performance when all seeds are user-provided, but a large effect when iSEAL is bootstrapped. In particular, we show that Random Walk with Restart is nearly as good as Bayesian Sets with user-provided seeds, and performs best with bootstrapped seeds.