Making large-scale support vector machine learning practical
Advances in kernel methods
Snowball: extracting relations from large plain-text collections
DL '00 Proceedings of the fifth ACM conference on Digital libraries
Cluster validity methods: part I
ACM SIGMOD Record
Automatic acquisition of hyponyms from large text corpora
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Design of the MUC-6 evaluation
MUC6 '95 Proceedings of the 6th conference on Message understanding
Discovering relations among named entities from large corpora
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Preemptive information extraction using unrestricted relation discovery
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Scaling up all pairs similarity search
Proceedings of the 16th international conference on World Wide Web
On-demand information extraction
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Strategies for lifelong knowledge extraction from the web
Proceedings of the 4th international conference on Knowledge capture
Clustering for unsupervised relation identification
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Introduction to Information Retrieval
Introduction to Information Retrieval
Unsupervised information extraction approach using graph mutual reinforcement
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Open information extraction from the web
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Unsupervised learning of semantic relations between concepts of a molecular biology ontology
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Unsupervised Relation Extraction by Massive Clustering
ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Distant supervision for relation extraction without labeled data
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Unsupervised relation extraction by mining Wikipedia texts using information from the web
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Practical very large scale CRFs
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Using temporal cues for segmenting texts into events
IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing
Ensemble semantics for large-scale unsupervised relation extraction
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Information extraction as a filtering task
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Assessing sparse information extraction using semantic contexts
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
Information Extraction has recently been extended to new areas by loosening the constraints on the strict definition of the extracted information and allowing to design more open information extraction systems. In this new domain of unsupervised information extraction, we focus on the task of extracting and characterizing a priori unknown relations between a given set of entity types. One of the challenges of this task is to deal with the large amount of candidate relations when extracting them from a large corpus. We propose in this paper an approach for the filtering of such candidate relations based on heuristics and machine learning models. More precisely, we show that the best model for achieving this task is a Conditional Random Field model according to evaluations performed on a manually annotated corpus of about one thousand relations. We also tackle the problem of identifying semantically similar relations by clustering large sets of them. Such clustering is achieved by combining a classical clustering algorithm and a method for the efficient identification of highly similar relation pairs. Finally, we evaluate the impact of our filtering of relations on this semantic clustering with both internal measures and external measures. Results show that the filtering procedure doubles the recall of the clustering while keeping the same precision.