Collective information extraction using first-order probabilistic models

Authors:
Slavko Žitnik;Lovro Šubelj;Dejan Lavbič;Aljaž Zrnec;Marko Bajec
Affiliations:
University of Ljubljana, Ljubljana, Slovenia;University of Ljubljana, Ljubljana, Slovenia;University of Ljubljana, Ljubljana, Slovenia;University of Ljubljana, Ljubljana, Slovenia;University of Ljubljana, Ljubljana, Slovenia
Venue:
Proceedings of the Fifth Balkan Conference in Informatics
Year:
2012

Citing 8
Cited 0

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Artificial Intelligence: A Modern Approach

Artificial Intelligence: A Modern Approach
Improving machine learning approaches to coreference resolution

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Collective information extraction with relational Markov networks

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Incorporating non-local information into information extraction systems by Gibbs sampling

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Ontology-driven information extraction with ontosyphon

ISWC'06 Proceedings of the 5th international conference on The Semantic Web
Stanford's multi-pass sieve coreference resolution system at the CoNLL-2011 shared task

CONLL Shared Task '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task
Learning multilingual named entity recognition from Wikipedia

Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditional information extraction (IE) tasks roughly consist of named-entity recognition, relation extraction and coreference resolution. Much work in this area focuses primarily on separate subtasks where best performance can be achieved only on specialized domains. In this paper we present a collective IE approach combining all three tasks by employing linear-chain conditional random fields. The usage of probabilistic models enables us to easily communicate between tasks on the fly and error correction during the iterative process execution. We introduce a novel iterative-based IE system architecture with additional semantic and collective feature functions. Proposed system is evaluated against real-world data set, introduced in the paper, and results are better over traditional approaches on two tested tasks by error reduction and performance improvements.