Complete inverted files for efficient text retrieval and analysis
Journal of the ACM (JACM)
Text algorithms
Data mining using two-dimensional optimized association rules: scheme, algorithms, and visualization
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
From data mining to knowledge discovery: an overview
Advances in knowledge discovery and data mining
Discovering Characteristic Patterns from Collections of Classical Japanese Poems
DS '98 Proceedings of the First International Conference on Discovery Science
Discovering Poetic Allusion in Anthologies of Classical Japanese Poems
DS '99 Proceedings of the Second International Conference on Discovery Science
Hi-index | 0.00 |
We attempt to extract characteristic expressions from literary works. That is, our problem is, given literary works by a particular writer as positive examples and works by another writer as negative examples, to find expressions that appear frequently in the positive examples but do not so in the negative examples. It is considered as a special case of the optimal pattern discovery from textual data, in which only the substring patterns are considered. One reasonable approach is to create a list of substrings arranged in the descending order of their goodness, and to examine a first part of the list by a human expert. Since there is no word boundary in Japanese texts, a substring is often a fragment of a word or a phrase. How to assist the human expert is a key to success in discovery. In this paper, we propose (1) to restrict to the prime substrings in order to remove redundancy from the list, and (2) a way of browsing the neighbor of a focused string as well as its context. Using this method, we report successful results against two pairs of anthologies of classical Japanese poems. We expect that the extracted expressions will possibly lead to discovering overlooked aspects of individual poets.