Segmenting sentences into linky strings using d-bigram statistics

Authors:
Shiho Nobesawa;Junya Tsutsumi;Sun Da Jiang;Tomohisa Sano;Kengo Sato;Masakazu Nakanishi
Affiliations:
Keio University, Yokohama, Japan;Keio University, Yokohama, Japan;Keio University, Yokohama, Japan;Keio University, Yokohama, Japan;Keio University, Yokohama, Japan;Keio University, Yokohama, Japan
Venue:
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Year:
1996

Citing 4
Cited 3

A probabilistic algorithm for segmenting non-Kanji Japanese strings

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Word association norms, mutual information, and lexicography

ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics
A statistical approach to language translation

COLING '88 Proceedings of the 12th conference on Computational linguistics - Volume 1
Segmenting a sentence into morphemes using statistic information between words

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1

Automatic semantic sequence extraction from unrestricted non-tagged texts

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Entropy as an indicator of context boundaries: an experiment using a web search engine

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
The pwc connection machine: an adaptive expertise provider

ECCBR'06 Proceedings of the 8th European conference on Advances in Case-Based Reasoning

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is obvious that segmentation takes an important role in natural language processing(NLP), especially for the languages whose sentences are not easily separated into morphemes. In this study we propose a method of segmenting a sentence. The system described in this paper does not use any grammatical information or knowledge in processing. Instead, it uses statistical information drawn from non-tagged corpus of the target language. Most of the segmenting systems are to pick out conventional morphemes which is defined for human use. However, we still do not know whether those conventional morphemes are good units for computational processing.In this paper we explain our system's algorithm and its experimental results on Japanese, though this system is not designed for a particular language.