Information retrieval: data structures and algorithms
Information retrieval: data structures and algorithms
Lexical analysis and stoplists
Information retrieval
The art of computer programming, volume 3: (2nd ed.) sorting and searching
The art of computer programming, volume 3: (2nd ed.) sorting and searching
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Communications of the ACM
Adaptive multilingual sentence boundary disambiguation
Computational Linguistics
Critical tokenization and its properties
Computational Linguistics
Tokenization as the initial phase in NLP
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 4
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Use of morphological analysis in protein name recognition
Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
ACM Transactions on Asian Language Information Processing (TALIP)
Protein name tagging for biomedical annotation in text
BioMed '03 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
Searching for Illustrative Sentences for Multiword Expressions in a Research Paper Database
ICADL 08 Proceedings of the 11th International Conference on Asian Digital Libraries: Universal and Ubiquitous Access to Information
Web-Based Knowledge Database Construction Method for Supporting Design
PAKM '08 Proceedings of the 7th International Conference on Practical Aspects of Knowledge Management
Web-based knowledge database construction method for supporting design
Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services
A language independent approach for named entity recognition in subject headings
TPDL'11 Proceedings of the 15th international conference on Theory and practice of digital libraries: research and advanced technology for digital libraries
Hi-index | 0.00 |
This paper proposes a framework of language independent morphological analysis and mainly concentrate on tokenization, the first process of morphological analysis. Although tokenization is usually not regarded as a difficult task in most segmented languages such as English, there are a number of problems in achieving precise treatment of lexical entries. We first introduce the concept of morpho-fragments, which are intermediate units between characters and lexical entries. We describe our approach to resolve problems arising in tokenization so as to attain a language independent morphological analyzer.