Filtering and clustering relations for unsupervised information extraction in open domain

Authors:
Wei Wang;Romaric Besançon;Olivier Ferret;Brigitte Grau
Affiliations:
CEA LIST, Fontenay-aux-Roses, France;CEA LIST, Fontenay-aux-Roses, France;CEA LIST, Fontenay-aux-Roses, France;LIMSI CNRS, Orsay, France
Venue:
Proceedings of the 20th ACM international conference on Information and knowledge management
Year:
2011

Citing 22
Cited 3

Making large-scale support vector machine learning practical

Advances in kernel methods
Snowball: extracting relations from large plain-text collections

DL '00 Proceedings of the fifth ACM conference on Digital libraries
Cluster validity methods: part I

ACM SIGMOD Record
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Design of the MUC-6 evaluation

MUC6 '95 Proceedings of the 6th conference on Message understanding
Computational cluster validation in post-genomic data analysis

Bioinformatics
Discovering relations among named entities from large corpora

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Preemptive information extraction using unrestricted relation discovery

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Scaling up all pairs similarity search

Proceedings of the 16th international conference on World Wide Web
On-demand information extraction

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Strategies for lifelong knowledge extraction from the web

Proceedings of the 4th international conference on Knowledge capture
Clustering for unsupervised relation identification

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Introduction to Information Retrieval

Introduction to Information Retrieval
Unsupervised information extraction approach using graph mutual reinforcement

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Open information extraction from the web

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Unsupervised learning of semantic relations between concepts of a molecular biology ontology

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Unsupervised Relation Extraction by Massive Clustering

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Distant supervision for relation extraction without labeled data

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Unsupervised relation extraction by mining Wikipedia texts using information from the web

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Practical very large scale CRFs

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Using temporal cues for segmenting texts into events

IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing

Ensemble semantics for large-scale unsupervised relation extraction

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Information extraction as a filtering task

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Assessing sparse information extraction using semantic contexts

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Information Extraction has recently been extended to new areas by loosening the constraints on the strict definition of the extracted information and allowing to design more open information extraction systems. In this new domain of unsupervised information extraction, we focus on the task of extracting and characterizing a priori unknown relations between a given set of entity types. One of the challenges of this task is to deal with the large amount of candidate relations when extracting them from a large corpus. We propose in this paper an approach for the filtering of such candidate relations based on heuristics and machine learning models. More precisely, we show that the best model for achieving this task is a Conditional Random Field model according to evaluations performed on a manually annotated corpus of about one thousand relations. We also tackle the problem of identifying semantically similar relations by clustering large sets of them. Such clustering is achieved by combining a classical clustering algorithm and a method for the efficient identification of highly similar relation pairs. Finally, we evaluate the impact of our filtering of relations on this semantic clustering with both internal measures and external measures. Results show that the filtering procedure doubles the recall of the clustering while keeping the same precision.