Entity discovery and annotation in tables

Authors:
Gianluca Quercini;Chantal Reynaud
Affiliations:
Université Paris-Sud XI;Université Paris-Sud XI
Venue:
Proceedings of the 16th International Conference on Extending Database Technology
Year:
2013

Citing 19
Cited 3

Fine grained classification of named entities

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Yago: a core of semantic knowledge

Proceedings of the 16th international conference on World Wide Web
Discovering geographic locations in web pages using urban addresses

Proceedings of the 4th ACM workshop on Geographical information retrieval
Entity categorization over large document collections

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
WebTables: exploring the power of tables on the web

Proceedings of the VLDB Endowment
RDF123: From Spreadsheets to RDF

ISWC '08 Proceedings of the 7th International Conference on The Semantic Web
Fuzzy Annotation of Web Data Tables Driven by a Domain Ontology

ESWC 2009 Heraklion Proceedings of the 6th European Semantic Web Conference on The Semantic Web: Research and Applications
Fine-grained classification of named entities exploiting latent semantic kernels

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
DBpedia - A crystallization point for the Web of Data

Web Semantics: Science, Services and Agents on the World Wide Web
Google fusion tables: web-centered data management and collaboration

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Annotating and searching web tables using entities, types and relationships

Proceedings of the VLDB Endowment
Crowdsourcing systems on the World-Wide Web

Communications of the ACM
Converting and annotating quantitative data tables

ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part I
Enhancing the open-domain classification of named entity using linked open data

ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part I
ITEM: extract and integrate entities from tabular data to RDF knowledge base

APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
Recovering semantics of tables on the web

Proceedings of the VLDB Endowment
DC proposal: graphical models and probabilistic reasoning for generating linked data from tables

ISWC'11 Proceedings of the 10th international conference on The semantic web - Volume Part II
Probase: a probabilistic taxonomy for text understanding

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Understanding tables on the web

ER'12 Proceedings of the 31st international conference on Conceptual Modeling

On the enrichment of a RDF repository of city points of interest based on social data

Proceedings of the 2nd International Workshop on Open Data
Semantic extraction of geographic data from web tables for big data integration

Proceedings of the 7th Workshop on Geographic Information Retrieval
Scalable column concept determination for web tables using large knowledge bases

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Web is rich of tables (e.g., HTML tables, spreadsheets, Google Fusion Tables) that host a considerable wealth of high-quality relational data. Unlike unstructured texts, tables usually favour the automatic extraction of data because of their regular structure and properties. The data extraction is usually complemented by the annotation of the table, which determines its semantics by identifying a type for each column, the relations between columns, if any, and the entities that occur in each cell. In this paper, we focus on the problem of discovering and annotating entities in tables. More specifically, we describe an algorithm that identifies the rows of a table that contain information on entities of specific types (e.g., restaurant, museum, theatre) derived from an ontology and determines the cells in which the names of those entities occur. We implemented this algorithm while developing a faceted browser over a repository of RDF data on points of interest of cities that we extracted from Google Fusion Tables. We claim that our algorithm complements the existing approaches, which annotate entities in a table based on a pre-compiled reference catalogue that lists the types of a finite set of entities; as a result, they are unable to discover and annotate entities that do not belong to the reference catalogue. Instead, we train our algorithm to look for information on previously unseen entities on the Web so as to annotate them with the correct type.