Clustering for unsupervised relation identification

Authors:
Benjamin Rosenfeld;Ronen Feldman
Affiliations:
Hebrew University, Jerusalem, Israel;Hebrew University, Jerusalem, Israel
Venue:
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Year:
2007

Citing 8
Cited 19

On Clustering Validation Techniques

Journal of Intelligent Information Systems
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
Unsupervised named-entity extraction from the web: an experimental study

Artificial Intelligence
Introduction to the CoNLL-2000 shared task: chunking

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
High-Performance Unsupervised Relation Extraction from Large Corpora

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Discovering relations among named entities from large corpora

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Preemptive information extraction using unrestricted relation discovery

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Self-supervised relation extraction from the web

ISMIS'06 Proceedings of the 16th international conference on Foundations of Intelligent Systems

A Wavelet-Based Model to Recognize High-Quality Topics on Web Forum

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Acquiring Semantic Relations Using the Web for Constructing Lightweight Ontologies

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
A Feature-Based Approach for Relation Extraction from Thai News Documents

PAISI '09 Proceedings of the Pacific Asia Workshop on Intelligence and Security Informatics
Topic identification for fine-grained opinion analysis

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Seeded discovery of base relations in large corpora

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A metric-based framework for automatic taxonomy induction

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Unsupervised relation extraction by mining Wikipedia texts using information from the web

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Extraction and approximation of numerical attributes from the Web

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Automated translation of semantic relationships

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Semi-supervised semantic pattern discovery with guidance from unsupervised pattern clusters

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
In-domain relation discovery with meta-constraints via posterior regularization

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Filtering and clustering relations for unsupervised information extraction in open domain

Proceedings of the 20th ACM international conference on Information and knowledge management
Facilitating pattern discovery for relation extraction with semantic-signature-based clustering

Proceedings of the 20th ACM international conference on Information and knowledge management
A generative model for unsupervised discovery of relations and argument classes from clinical texts

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Unsupervised lexicon acquisition for HPSG-based relation extraction

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Extracting information networks from the blogosphere

ACM Transactions on the Web (TWEB)
A weighting scheme for open information extraction

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop
Ensemble semantics for large-scale unsupervised relation extraction

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Automatic evaluation of relation extraction systems on large-scale

AKBC-WEKEX '12 Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction

Quantified Score

Hi-index	0.00

Visualization

Abstract

Unsupervised Relation Identification is the task of automatically discovering interesting relations between entities in a large text corpora. Relations are identified by clustering the frequently co-occurring pairs of entities in such a way that pairs occurring in similar contexts end up belonging to the same clusters. In this paper we compare several clustering setups, some of them novel and others already tried. The setups include feature extraction and selection methods and clustering algorithms. In order to do the comparison, we develop a clustering evaluation metric, specifically adapted for the relation identification task. Our experiments demonstrate significant superiority of the single-linkage hierarchical clustering with the novel threshold selection technique over the other tested clustering algorithms. Also, the experiments indicate that for successful relation identification it is important to use rich complex features of two kinds: features that test both relation slots together ("relation features"), and features that test only one slot each ("entity features"). We have found that using both kinds of features with the best of the algorithms produces very high-precision results, significantly improving over the previous work.