A morphological analyzer using hash tables in main memory (MAHT) and a lexical knowledge base

Authors:
Francisco J. Carreras-Riudavets;Juan C. Rodríguez-del-Pino;Zenón Hernández-Figueroa;Gustavo Rodríguez-Rodríguez
Affiliations:
Departamento de Informática y Sistemas, Universidad de Las Palmas de Gran Canaria, Las Palmas, Spain;Departamento de Informática y Sistemas, Universidad de Las Palmas de Gran Canaria, Las Palmas, Spain;Departamento de Informática y Sistemas, Universidad de Las Palmas de Gran Canaria, Las Palmas, Spain;Departamento de Informática y Sistemas, Universidad de Las Palmas de Gran Canaria, Las Palmas, Spain
Venue:
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Year:
2012

Citing 13
Cited 1

In-memory hash tables for accumulating text vocabularies

Information Processing Letters
Modern Information Retrieval

Modern Information Retrieval
Advances in Automatic Text Summarization

Advances in Automatic Text Summarization
Morpheme Based Language Models for Speech Recognition of Czech

TDS '00 Proceedings of the Third International Workshop on Text, Speech and Dialogue
Performance in Practice of String Hashing Functions

Proceedings of the Fifth International Conference on Database Systems for Advanced Applications (DASFAA)
Query processing and optimization in Oracle Rdb

The VLDB Journal — The International Journal on Very Large Data Bases
Applied morphological processing of English

Natural Language Engineering
Industrial applications of unification morphology

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
A unification-based approach to morpho-syntactic parsing of agglutinative and other (highly) inflectional languages

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Incremental construction of minimal acyclic finite state automata and transducers

FSMNLP '09 Proceedings of the International Workshop on Finite State Methods in Natural Language Processing
Two-level model for morphological analysis

IJCAI'83 Proceedings of the Eighth international joint conference on Artificial intelligence - Volume 2
The Spanish morphology in internet

ICWE'03 Proceedings of the 2003 international conference on Web engineering
Cache-Conscious collision resolution in string hash tables

SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval

Review: Automatic syllabification for Spanish using lemmatization and derivation to solve the prefix's prominence issue

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a morphological analyzer for the Spanish language (MAHT). This system is mainly based on the storage of words and its morphological information, leading to a lexical knowledge base that has almost five million words. The lexical knowledge base practically covers the whole morphological casuistry of the Spanish language. However, the analyzer solves the processing of prefixes and of enclitic pronouns by easy rules, since the words that can include these elements are much and some of them are neologisms. MAHT reaches a processing average speed over 275,000 words per second. This one is possible because it uses hash tables in main memory. MAHT has been designed to isolate the data from the algorithms that analyze words, even with their irregular forms. This design is very important for an irregular and highly inflectional language, like Spanish, to simplify the insertion of new words and the maintenance of program code.