Foundations of statistical natural language processing
Foundations of statistical natural language processing
The Journal of Machine Learning Research
Introduction to Information Retrieval
Introduction to Information Retrieval
Hi-index | 0.00 |
Language Technologies (LT) perform well when they rely on a previous Language Resources (LR) development. Hence, in this paper we illustrate how to build an efficient data mining system based on a coherent formalization of natural language and on a lingware (in Machine-Readable Form) built on the universal concepts of "lexical unit", "meaning unit" and "morphosyntactic context". From a linguistic and semantic point of view we get more coherent results through a linguistically motivated data mining than a statistical approach. We developed LR for Natural Language Processing (NLP) applications, composed by electronic dictionaries made of terminological multiword-expressions (Machine-Readable Form) and by local grammars (in the form of finite-state automata and transducers -- FSA/FST). Both parts of this lingware were built and applied according to Lexicon-Grammar (LG) formalization principles and methods. The mentioned language resources form the basis to develop an Information Retrieval System for e-Government. This device is a Semantic Web (SW) application, which will automatically recognize a given set of frequently-asked questions (from here on, FAQs) on European Community Information, previously formalized as syntactic patterns inside local grammars.