Aggregating semantic annotators

Authors:
Luying Chen;Stefano Ortona;Giorgio Orsi;Michael Benedikt
Affiliations:
Oxford University, UK;Oxford University, UK;Oxford University, UK;Oxford University, UK
Venue:
Proceedings of the VLDB Endowment
Year:
2013

Citing 28
Cited 0

Models for metasearch

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
An Alternate Objective Function for Markovian Fields

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Maximum Entropy Markov Models for Information Extraction and Segmentation

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Web-scale information extraction in knowitall: (preliminary results)

Proceedings of the 13th international conference on World Wide Web
Towards the self-annotating web

Proceedings of the 13th international conference on World Wide Web
LearningPinocchio: adaptive information extraction for real world applications

Natural Language Engineering
Maximum entropy models for named entity recognition

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Named entity recognition through classifier combination

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
A stacked, voted, stacked model for named entity recognition

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Boosting performance of bio-entity recognition by combining results from multiple systems

Proceedings of the 5th international workshop on Bioinformatics
Combining data-driven systems for improving Named Entity Recognition

Data & Knowledge Engineering
Minority vote: at-least-N voting improves recall for extracting relations

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
The complexity and approximation of fixing numerical attributes in databases under integrity constraints

Information Systems
Automatic wrapper induction from hidden-web sources with domain knowledge

Proceedings of the 10th ACM workshop on Web information and data management
SOFIE: a self-organizing framework for information extraction

Proceedings of the 18th international conference on World wide web
Uncertainty management in rule-based information extraction systems

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Design challenges and misconceptions in named entity recognition

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Semantic precision and recall for ontology alignment evaluation

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Integrating conflicting data: the role of source dependence

Proceedings of the VLDB Endowment
Corroborating information from disagreeing views

Proceedings of the third ACM international conference on Web search and data mining
Querying and repairing inconsistent numerical databases

ACM Transactions on Database Systems (TODS)
Semantic annotation, indexing, and retrieval

Web Semantics: Science, Services and Agents on the World Wide Web
Reliable Methods of Judgement Aggregation

Journal of Logic and Computation
Automatic wrappers for large scale web extraction

Proceedings of the VLDB Endowment
Text Processing with GATE

Text Processing with GATE
On the complexity of dealing with inconsistency in description logic ontologies

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Introduction to judgment aggregation

ESSLLI'10 Proceedings of the 2010 conference on ESSLLI 2010, and ESSLLI 2011 conference on Lectures on Logic and Computation
NERD: a framework for unifying named entity recognition and disambiguation extraction tools

EACL '12 Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

A growing number of resources are available for enriching documents with semantic annotations. While originally focused on a few standard classes of annotations, the ecosystem of annotators is now becoming increasingly diverse. Although annotators often have very different vocabularies, with both high-level and specialist concepts, they also have many semantic interconnections. We will show that both the overlap and the diversity in annotator vocabularies motivate the need for semantic annotation integration: middleware that produces a unified annotation on top of diverse semantic annotators. On the one hand, the diversity of vocabulary allows applications to benefit from the much richer vocabulary available in an integrated vocabulary. On the other hand, we present evidence that the most widely-used annotators on the web suffer from serious accuracy deficiencies: the overlap in vocabularies from individual annotators allows an integrated annotator to boost accuracy by exploiting inter-annotator agreement and disagreement. The integration of semantic annotations leads to new challenges, both compared to usual data integration scenarios and to standard aggregation of machine learning tools. We overview an approach to these challenges that performs ontology-aware aggregation. We introduce an approach that requires no training data, making use of ideas from database repair. We experimentally compare this with a supervised approach, which adapts maximal entropy Markov models to the setting of ontology-based annotations. We further experimentally compare both these approaches with respect to ontology-unaware supervised approaches, and to individual annotators.