Information retrieval
Approximate string-matching with q-grams and maximal matches
Theoretical Computer Science - Selected papers of the Combinatorial Pattern Matching School
Viewing morphology as an inference process
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Method for evaluation of stemming algorithms based on error counting
Journal of the American Society for Information Science
Stemming algorithms: a case study for detailed evaluation
Journal of the American Society for Information Science - Special issue: evaluation of information retrieval systems
Phonetic string matching: lessons from information retrieval
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Experiments with a stemming algorithm for Malay words
Journal of the American Society for Information Science
Phrasal translation and query expansion techniques for cross-language information retrieval
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Corpus-based stemming using cooccurrence of word variants
ACM Transactions on Information Systems (TOIS)
How reliable are the results of large-scale information retrieval experiments?
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A stemming procedure and stopword list for general French corpora
Journal of the American Society for Information Science
Journal of the American Society for Information Science
ACM Computing Surveys (CSUR)
Experiments in spoken document retrieval using phoneme n-grams
Speech Communication - Special issue on accessing information in spoken audio
A probabilistic model of information retrieval: development and comparative experiments
Information Processing and Management: an International Journal
Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Monolingual Document Retrieval for European Languages
Information Retrieval
Character N-Gram Tokenization for European Language Text Retrieval
Information Retrieval
Information retrieval system evaluation: effort, sensitivity, and reliability
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
ACSC '05 Proceedings of the Twenty-eighth Australasian conference on Computer Science - Volume 38
Language independent NER using a maximum entropy tagger
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Applying Link Grammar Formalism in the Development of English-Indonesian Machine Translation System
Proceedings of the 9th AISC international conference, the 15th Calculemas symposium, and the 7th international MKM conference on Intelligent Computer Mathematics
Web and corpus methods for Malay count classifier prediction
NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Translating from morphologically complex languages: a paraphrase-based approach
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Investigating the effectiveness of thesaurus generated using tolerance rough set model
ISMIS'11 Proceedings of the 19th international conference on Foundations of intelligent systems
A malay stemmer for jawi characters
AI'11 Proceedings of the 24th international conference on Advances in Artificial Intelligence
Lexicon-based Document Representation
Fundamenta Informaticae - Cognitive Informatics and Computational Intelligence: Theory and Applications
Hi-index | 0.00 |
Stemming words to (usually) remove suffixes has applications in text search, machine translation, document summarization, and text classification. For example, English stemming reduces the words "computer," "computing," "computation," and "computability" to their common morphological root, "comput-." In text search, this permits a search for "computers" to find documents containing all words with the stem "comput-." In the Indonesian language, stemming is of crucial importance: words have prefixes, suffixes, infixes, and confixes that make matching related words difficult. This work surveys existing techniques for stemming Indonesian words to their morphological roots, presents our novel and highly accurate CS algorithm, and explores the effectiveness of stemming in the context of general-purpose text information retrieval through ad hoc queries.