Semantic web and language resources for e-government: linguistically motivated data mining

Authors:
Annibale Elia;Daniela Vellutino;Federica Marano;Alberto Maria Langella;Antonella Napoli
Affiliations:
Università degli Studi di Salerno, Via Ponte Don Melillo s.n.c., Fisciano (SA), Italy;Università degli Studi di Salerno, Via Ponte Don Melillo s.n.c., Fisciano (SA), Italy;Università degli Studi di Salerno, Via Ponte Don Melillo s.n.c., Fisciano (SA), Italy;Università degli Studi di Salerno, Via Ponte Don Melillo s.n.c., Fisciano (SA), Italy;Università degli Studi di Salerno, Via Ponte Don Melillo s.n.c., Fisciano (SA), Italy
Venue:
Proceedings of the International Conference on Web Intelligence, Mining and Semantics
Year:
2011

Citing 3
Cited 0

Foundations of statistical natural language processing

Foundations of statistical natural language processing
Latent dirichlet allocation

The Journal of Machine Learning Research
Introduction to Information Retrieval

Introduction to Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Language Technologies (LT) perform well when they rely on a previous Language Resources (LR) development. Hence, in this paper we illustrate how to build an efficient data mining system based on a coherent formalization of natural language and on a lingware (in Machine-Readable Form) built on the universal concepts of "lexical unit", "meaning unit" and "morphosyntactic context". From a linguistic and semantic point of view we get more coherent results through a linguistically motivated data mining than a statistical approach. We developed LR for Natural Language Processing (NLP) applications, composed by electronic dictionaries made of terminological multiword-expressions (Machine-Readable Form) and by local grammars (in the form of finite-state automata and transducers -- FSA/FST). Both parts of this lingware were built and applied according to Lexicon-Grammar (LG) formalization principles and methods. The mentioned language resources form the basis to develop an Information Retrieval System for e-Government. This device is a Semantic Web (SW) application, which will automatically recognize a given set of frequently-asked questions (from here on, FAQs) on European Community Information, previously formalized as syntactic patterns inside local grammars.