Automatic set instance extraction using the web

Authors:
Richard C. Wang;William W. Cohen
Affiliations:
Carnegie Mellon University;Carnegie Mellon University
Venue:
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Year:
2009

Citing 12
Cited 13

Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Acquisition of categorized named entities for web search

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Fast Random Walk with Restart and Its Applications

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Semantic taxonomy induction from heterogenous evidence

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Organizing and searching the world wide web of facts -- step two: harnessing the wisdom of the crowds

Proceedings of the 16th international conference on World Wide Web
Weakly-supervised discovery of named entities using web search queries

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Language-Independent Set Expansion of Named Entities Using the Web

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Iterative Set Expansion of Named Entities Using the Web

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Bootstrapping named entity recognition with automatically generated gazetteer lists

EACL '06 Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop
Automatic set expansion for list question answering

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Unsupervised named-entity extraction from the Web: An experimental study

Artificial Intelligence
Unsupervised named-entity recognition: generating gazetteers and resolving ambiguity

AI'06 Proceedings of the 19th international conference on Advances in Artificial Intelligence: Canadian Society for Computational Studies of Intelligence

A web service for automatic word class acquisition

Proceedings of the 3rd International Universal Communication Symposium
The role of queries in ranking labeled instances extracted from text

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Nonlinear evidence fusion and propagation for hyponymy relation mining

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Recovering semantics of tables on the web

Proceedings of the VLDB Endowment
User Behaviors in Related Word Retrieval and New Word Detection: A Collaborative Perspective

ACM Transactions on Asian Language Information Processing (TALIP)
Towards semantic category verification with arbitrary precision

ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
Asking what no one has asked before: using phrase similarities to generate synthetic web search queries

Proceedings of the 20th ACM international conference on Information and knowledge management
WebSets: extracting sets of entities from the web using unsupervised information extraction

Proceedings of the fifth ACM international conference on Web search and data mining
A semi-supervised approach to extracting multiword entity names from user reviews

Proceedings of the 1st Joint International Workshop on Entity-Oriented and Semantic Search
Ensemble semantics for large-scale unsupervised relation extraction

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Collectively representing semi-structured data from the web

AKBC-WEKEX '12 Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction
WebPut: efficient web-based data imputation

WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Acquisition of open-domain classes via intersective semantics

Proceedings of the 23rd international conference on World wide web

Quantified Score

Hi-index	0.01

Visualization

Abstract

An important and well-studied problem is the production of semantic lexicons from a large corpus. In this paper, we present a system named ASIA (Automatic Set Instance Acquirer), which takes in the name of a semantic class as input (e.g., "car makers") and automatically outputs its instances (e.g., "ford", "nissan", "toyota"). ASIA is based on recent advances in web-based set expansion - the problem of finding all instances of a set given a small number of "seed" instances. This approach effectively exploits web resources and can be easily adapted to different languages. In brief, we use language-dependent hyponym patterns to find a noisy set of initial seeds, and then use a state-of-the-art language-independent set expansion system to expand these seeds. The proposed approach matches or outperforms prior systems on several English-language benchmarks. It also shows excellent performance on three dozen additional benchmark problems from English, Chinese and Japanese, thus demonstrating language-independence.