Key semantics extraction by dependency tree mining

Authors:
Satoshi Morinaga;Hiroki Arimura;Takahiro Ikeda;Yosuke Sakao;Susumu Akamine
Affiliations:
NEC Corporation, Kawasaki, Kanagawa, Japan;Hokkaido University, Sapporo, Hokkaido, Japan;NEC Corporation, Kawasaki, Kanagawa, Japan;NEC Corporation, Kawasaki, Kanagawa, Japan;NEC Corporation, Kawasaki, Kanagawa, Japan
Venue:
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Year:
2005

Citing 11
Cited 1

Constrained frequent pattern mining: a pattern-growth view

ACM SIGKDD Explorations Newsletter
Efficient generation of plane trees

Information Processing Letters
Mining Open Answers in Questionnaire Data

IEEE Intelligent Systems
Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Optimized Substructure Discovery for Semi-structured Data

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Efficiently mining frequent trees in a forest

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining product reputations on the Web

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Indexing and Mining Free Trees

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Tracking dynamics of topic trends using a finite mixture model

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
A decision-theoretic extension of stochastic complexity and its applications to learning

IEEE Transactions on Information Theory

Efficient algorithms for mining frequent and closed patterns from semi-structured data

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a new text mining system which extracts characteristic contents from given documents. We define Key semantics as characteristic sub-structures of syntactic dependencies in the given documents, and consider the following three tasks in this paper: 1)Key semantics extraction: extracting characteristic syntactic dependency structures not only as ordered trees but also as unordered trees and free trees, 2)Redundancy reduction: from the result of extraction, deleting redundant dependency structures such as sub-structures or equivalent structures of the others, and 3)Phrase/sentence reconstruction: generating a phrase or sentence in a natural language corresponding to the extracted structure.Our system is a combination of natural language processing techniques and tree mining techniques. The system consists of the following five units: 1) syntactic dependency analysis unit, 2) input filters, 3) characteristic ordered subtree extraction unit, 4) output filters, and 5) phrase/sentence reconstruction unit. Although ordered trees are extracted in the third unit, the overall behavior of the system can be switched into the extraction of ordered trees, unordered trees, or free trees depending on which of the input filters is/are applied in the second step. The output filters delete redundant trees from the extraction result for efficient knowledge discovery. Finally, phrases or sentences corresponding to the extracted subtrees are reconstructed by utilizing the input documents.We demonstrate the validity of our system by showing experimental results using real data collected at a help desk and TDT pilot corpus.