In-memory hash tables for accumulating text vocabularies
Information Processing Letters
Modern Information Retrieval
Advances in Automatic Text Summarization
Advances in Automatic Text Summarization
Morpheme Based Language Models for Speech Recognition of Czech
TDS '00 Proceedings of the Third International Workshop on Text, Speech and Dialogue
Performance in Practice of String Hashing Functions
Proceedings of the Fifth International Conference on Database Systems for Advanced Applications (DASFAA)
Query processing and optimization in Oracle Rdb
The VLDB Journal — The International Journal on Very Large Data Bases
Applied morphological processing of English
Natural Language Engineering
Industrial applications of unification morphology
ANLC '94 Proceedings of the fourth conference on Applied natural language processing
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Incremental construction of minimal acyclic finite state automata and transducers
FSMNLP '09 Proceedings of the International Workshop on Finite State Methods in Natural Language Processing
Two-level model for morphological analysis
IJCAI'83 Proceedings of the Eighth international joint conference on Artificial intelligence - Volume 2
The Spanish morphology in internet
ICWE'03 Proceedings of the 2003 international conference on Web engineering
Cache-Conscious collision resolution in string hash tables
SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
This paper presents a morphological analyzer for the Spanish language (MAHT). This system is mainly based on the storage of words and its morphological information, leading to a lexical knowledge base that has almost five million words. The lexical knowledge base practically covers the whole morphological casuistry of the Spanish language. However, the analyzer solves the processing of prefixes and of enclitic pronouns by easy rules, since the words that can include these elements are much and some of them are neologisms. MAHT reaches a processing average speed over 275,000 words per second. This one is possible because it uses hash tables in main memory. MAHT has been designed to isolate the data from the algorithms that analyze words, even with their irregular forms. This design is very important for an irregular and highly inflectional language, like Spanish, to simplify the insertion of new words and the maintenance of program code.