Elhisa: An architecture for the integration of heterogeneous lexical information

Authors:
Xabier Artola;Aitor Soroa
Affiliations:
University of the basque country email: a.soroa@ehu.es;University of the basque country email: a.soroa@ehu.es
Venue:
Natural Language Engineering
Year:
2008

Citing 23
Cited 0

A federated architecture for information management

ACM Transactions on Information Systems (TOIS)
Federated database systems for managing distributed, heterogeneous, and autonomous databases

ACM Computing Surveys (CSUR) - Special issue on heterogeneous databases
Applications of global information technology: key issues for management

MIS Quarterly
The EDR electronic dictionary

Communications of the ACM
Answering queries using views (extended abstract)

PODS '95 Proceedings of the fourteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
The TSIMMIS Approach to Mediation: Data Models and Languages

Journal of Intelligent Information Systems - Special issue: next generation information technologies and systems
Rewriting queries using views in description logics

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Answering recursive queries using views

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Complexity of answering queries using materialized views

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Database techniques for the World-Wide Web: a survey

ACM SIGMOD Record
OBSERVER: An Approach for Query Processing in Global Information Systems Based on Interoperation Across Pre-Existing Ontologies

Distributed and Parallel Databases
Learning object identification rules for information integration

Information Systems - Data extraction, cleaning and reconciliation
Fundamentals of Data Warehouses

Fundamentals of Data Warehouses
Information Integration Using Logical Views

ICDT '97 Proceedings of the 6th International Conference on Database Theory
Don't Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
A Scalable Algorithm for Answering Queries Using Views

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Declarative Data Cleaning: Language, Model, and Algorithms

Proceedings of the 27th International Conference on Very Large Data Bases
Querying Heterogeneous Information Sources Using Source Descriptions

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Towards a Standard for a Multilingual Lexical Entry: The EAGLES/ISLE Initiative

CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Query Transformation for PSJ-Queries

VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
Combining multiple, large-scale resources in a reusable lexicon for natural language generation

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Lexical knowledge representation in an intelligent dictionary help system

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Putting pieces together: combining FrameNet, VerbNet and WordNet for robust semantic parsing

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The design and construction of lexical resources is a critical issue in Natural Language Processing (NLP). Real-world NLP systems need large-scale lexica, which provide rich information about words and word senses at all levels: morphologic, syntactic, lexical semantics, etc., but the construction of lexical resources is a difficult and costly task. The last decade has been highly influenced by the notion of reusability, that is, the use of the information of existing lexical resources in constructing new ones. It is unrealistic, however, to expect that the great variety of available lexical information resources could be converted into a single and standard representation schema in the near future. The purpose of this article is to present the ELHISA system, a software architecture for the integration of heterogeneous lexical information. We address, from the point of view of the information integration area, the problem of querying very different existing lexical information sources using a unique and common query language. The integration in ELHISA is performed in a logical way, so that the lexical resources do not suffer any modification when integrating them into the system. ELHISA is primarily defined as a consultation system for accessing structured lexical information, and therefore it does not have the capability to modify or update the underlying information. For this purpose, a General Conceptual Model (GCM) for describing diverse lexical data has been conceived. The GCM establishes a fixed vocabulary describing objects in the lexical information domain, their attributes, and the relationships among them. To integrate the lexical resources into the federation, a Source Conceptual Model (SCM) is built on the top of each one, which represents the lexical objects concurring in each particular source. To answer the user queries, ELHISA must access the integrated resources, and, hence, it must translate the query expressed in GCM terms into queries formulated in terms of the SCM of each source. The relation between the GCM and the SCMs is explicitly described by means of mapping rules called Content Description Rules. Data integration at the extensional level is achieved by means of the data cleansing process, needed if we want to compare the data arriving from different sources. In this process, the object identification step is carried out. Based on this architecture, a prototype named ELHISA has been built, and five resources covering a broad scope have been integrated into it so far for testing purposes. The fact that such heterogeneous resources have been integrated with ease into the system shows, in the opinion of the authors, the suitability of the approach taken.