A semantic enrichment of data tables applied to food risk assessment

Authors:
Hélène Gagliardi;Ollivier Haemmerlé;Nathalie Pernelle;Fatiha Saïs
Affiliations:
LRI (UMR CNRS 8623 – Université Paris-Sud) / INRIA (Futurs), Orsay, France;LRI (UMR CNRS 8623 – Université Paris-Sud) / INRIA (Futurs), Orsay, France;LRI (UMR CNRS 8623 – Université Paris-Sud) / INRIA (Futurs), Orsay, France;LRI (UMR CNRS 8623 – Université Paris-Sud) / INRIA (Futurs), Orsay, France
Venue:
DS'05 Proceedings of the 8th international conference on Discovery Science
Year:
2005

Citing 4
Cited 2

A survey of approaches to automatic schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
Extracting structured data from Web pages

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Towards the self-annotating web

Proceedings of the 13th international conference on World Wide Web
Profile-Based Object Matching for Information Integration

IEEE Intelligent Systems

The MIEL++ architecture when RDB, CGs and XML meet for the sake of risk assessment in food products

ICCS'06 Proceedings of the 14th international conference on Conceptual Structures: inspiration and Application
Approximate querying of XML fuzzy data

FQAS'06 Proceedings of the 7th international conference on Flexible Query Answering Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Our work deals with the automatic construction of domain specific data warehouses. Our application domain concerns microbiological risks in food products. The MIEL++ system [2], implemented during the Sym'Previus project, is a tool based on a database containing experimental and industrial results about the behavior of pathogenic germs in food products. This database is incomplete by nature since the number of possible experiments is potentially infinite. Our work, developed within the e.dot project, presents a way of palliating that incompleteness by complementing the database with data automatically extracted from the Web. We propose to query these data through a mediated architecture based on a domain ontology. So, we need to make them compatible with the ontology. In the e.dot project [5], we exclusively focus on documents in Html or Pdf format which contain data tables. Data tables are very common presentation scheme to describe synthetic data in scientific articles. These tables are semantically enriched and we want this enrichment to be as automatic and flexible as possible. Thus, we have defined a Document Type Definition named SML (Semantic Markup Language) which can deal with additional or incomplete information in a semantic relation, ambiguities or possible interpretation errors. In this paper, we present this semantic enrichment step.