SEISA: set expansion by iterative similarity aggregation

Authors:
Yeye He;Dong Xin
Affiliations:
University of Wisconsin-Madison, Madison, WI, USA;Microsoft Research, Redmond, WA, USA
Venue:
Proceedings of the 20th international conference on World wide web
Year:
2011

Citing 12
Cited 3

Snowball: extracting relations from large plain-text collections

DL '00 Proceedings of the fifth ACM conference on Digital libraries
Web-scale information extraction in knowitall: (preliminary results)

Proceedings of the 13th international conference on World Wide Web
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Video suggestion and discovery for youtube: taking random walks through the view graph

Proceedings of the 17th international conference on World Wide Web
Language-Independent Set Expansion of Named Entities Using the Web

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Iterative Set Expansion of Named Entities Using the Web

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Biomedical named entity recognition using conditional random fields and rich feature sets

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
A context pattern induction method for named entity extraction

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Weakly-supervised acquisition of labeled class instances using graph random walks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Unsupervised named-entity extraction from the Web: An experimental study

Artificial Intelligence
Semi-supervised learning of semantic classes for query understanding: from the web and for the web

Proceedings of the 18th ACM conference on Information and knowledge management
Character-level analysis of semi-structured documents for set expansion

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3

InfoGather: entity augmentation and attribute discovery by holistic matching with web tables

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
PriSM: discovering and prioritizing severe technical issues from product discussion forums

Proceedings of the 21st ACM international conference on Information and knowledge management
Understanding tables on the web

ER'12 Proceedings of the 31st international conference on Conceptual Modeling

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we study the problem of expanding a set of given seed entities into a more complete set by discovering other entities that also belong to the same concept set. A typical example is to use "Canon" and "Nikon" as seed entities, and derive other entities (e.g., "Olympus") in the same concept set of camera brands. In order to discover such relevant entities, we exploit several web data sources, including lists extracted from web pages and user queries from a web search engine. While these web data are highly diverse with rich information that usually cover a wide range of the domains of interest, they tend to be very noisy. We observe that previously proposed random walk based approaches do not perform very well on these noisy data sources. Accordingly, we propose a new general framework based on iterative similarity aggregation, and present detailed experimental results to show that, when using general-purpose web data for set expansion, our approach outperforms previous techniques in terms of both precision and recall.