Assessing sparse information extraction using semantic contexts

Authors:
Peipei Li;Haixun Wang;Hongsong Li;Xindong Wu
Affiliations:
Hefei University of Technology, Hefei city, China;Microsoft Research Asia, Bei Jing, China;Microsoft Research Asia, Bei Jing, China;University of Vermont, Vermont, USA
Venue:
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Year:
2013

Citing 18
Cited 0

Snowball: extracting relations from large plain-text collections

DL '00 Proceedings of the fifth ACM conference on Digital libraries
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
StatSnowball: a statistical approach to extracting entity relationships

Proceedings of the 18th international conference on World wide web
Boosting unsupervised relation extraction by using NER

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Confidence estimation for information extraction

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Deriving a large scale taxonomy from Wikipedia

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
A probabilistic model of redundancy in information extraction

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Analysis of a probabilistic model of redundancy in unsupervised information extraction

Artificial Intelligence
Improved extraction assessment through better language models

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Open information extraction using Wikipedia

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Collective cross-document relation extraction without labelled data

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Boosting relation extraction with limited closed-world knowledge

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
YAGO2: exploring and querying world knowledge in time, space, context, and many languages

Proceedings of the 20th international conference companion on World wide web
Knowledge-based weak supervision for information extraction of overlapping relations

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Filtering and clustering relations for unsupervised information extraction in open domain

Proceedings of the 20th ACM international conference on Information and knowledge management
WebSets: extracting sets of entities from the web using unsupervised information extraction

Proceedings of the fifth ACM international conference on Web search and data mining
Probase: a probabilistic taxonomy for text understanding

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Efficient extraction of ontologies from domain specific text corpora

Proceedings of the 21st ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

One important assumption of information extraction is that extractions occurring more frequently are more likely to be correct. Sparse information extraction is challenging because no matter how big a corpus is, there are extractions supported by only a small amount of evidence in the corpus. A pioneering work known as REALM learns HMMs to model the context of a semantic relationship for assessing the extractions. This is quite costly and the semantics revealed for the context are not explicit. In this work, we introduce a lightweight, explicit semantic approach for sparse information extraction. We use a large semantic network consisting of millions of concepts, entities, and attributes to explicitly model the context of semantic relationships. Experiments show that our approach improves the F-score of extraction by at least 11.2% over state-of-the-art, HMM based approaches while maintaining more efficiency.