Constrained frequent pattern mining: a pattern-growth view
ACM SIGKDD Explorations Newsletter
Efficient generation of plane trees
Information Processing Letters
Mining Open Answers in Questionnaire Data
IEEE Intelligent Systems
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Optimized Substructure Discovery for Semi-structured Data
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Efficiently mining frequent trees in a forest
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining product reputations on the Web
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
gSpan: Graph-Based Substructure Pattern Mining
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Indexing and Mining Free Trees
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Tracking dynamics of topic trends using a finite mixture model
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
A decision-theoretic extension of stochastic complexity and its applications to learning
IEEE Transactions on Information Theory
Efficient algorithms for mining frequent and closed patterns from semi-structured data
PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Hi-index | 0.00 |
We propose a new text mining system which extracts characteristic contents from given documents. We define Key semantics as characteristic sub-structures of syntactic dependencies in the given documents, and consider the following three tasks in this paper: 1)Key semantics extraction: extracting characteristic syntactic dependency structures not only as ordered trees but also as unordered trees and free trees, 2)Redundancy reduction: from the result of extraction, deleting redundant dependency structures such as sub-structures or equivalent structures of the others, and 3)Phrase/sentence reconstruction: generating a phrase or sentence in a natural language corresponding to the extracted structure.Our system is a combination of natural language processing techniques and tree mining techniques. The system consists of the following five units: 1) syntactic dependency analysis unit, 2) input filters, 3) characteristic ordered subtree extraction unit, 4) output filters, and 5) phrase/sentence reconstruction unit. Although ordered trees are extracted in the third unit, the overall behavior of the system can be switched into the extraction of ordered trees, unordered trees, or free trees depending on which of the input filters is/are applied in the second step. The output filters delete redundant trees from the extraction result for efficient knowledge discovery. Finally, phrases or sentences corresponding to the extracted subtrees are reconstructed by utilizing the input documents.We demonstrate the validity of our system by showing experimental results using real data collected at a help desk and TDT pilot corpus.