Entity set expansion using topic information

Authors:
Kugatsu Sadamitsu;Kuniko Saito;Kenji Imamura;Genichiro Kikui
Affiliations:
NTT Cyber Space Laboratories, NTT Corporation, Hikarinooka, Yokosuka-shi, Kanagawa, Japan;NTT Cyber Space Laboratories, NTT Corporation, Hikarinooka, Yokosuka-shi, Kanagawa, Japan;NTT Cyber Space Laboratories, NTT Corporation, Hikarinooka, Yokosuka-shi, Kanagawa, Japan;NTT Cyber Space Laboratories, NTT Corporation, Hikarinooka, Yokosuka-shi, Kanagawa, Japan
Venue:
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Year:
2011

Citing 15
Cited 3

Making large-scale support vector machine learning practical

Advances in kernel methods
Partially Supervised Classification of Text Documents

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Learning to Probabilistically Identify Authoritative Documents

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Latent dirichlet allocation

The Journal of Machine Learning Research
Japanese morphological analyzer using word co-occurrence: JTAG

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
A bootstrapping method for learning semantic lexicons using extraction pattern contexts

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Espresso: leveraging generic patterns for automatically harvesting semantic relations

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Training conditional random fields with multivariate evaluation measures

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
"More like these": growing entity classes from seeds

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Graph-based analysis of semantic drift in Espresso-like bootstrapping algorithms

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Distant supervision for relation extraction without labeled data

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Web-scale distributional similarity and entity set expansion

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
A latent dirichlet allocation method for selectional preferences

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Negative training data can be harmful to text classification

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
PLDA+: Parallel latent dirichlet allocation with data placement and pipeline processing

ACM Transactions on Intelligent Systems and Technology (TIST)

Coupled bayesian sets algorithm for semi-supervised learning and information extraction

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Learning to find comparable entities on the web

WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Feature words that classify problem sentence in scientific article

Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes three modules based on latent topics of documents for alleviating "semantic drift" in bootstrapping entity set expansion. These new modules are added to a discriminative bootstrapping algorithm to realize topic feature generation, negative example selection and entity candidate pruning. In this study, we model latent topics with LDA (Latent Dirichlet Allocation) in an unsupervised way. Experiments show that the accuracy of the extracted entities is improved by 6.7 to 28.2% depending on the domain.