Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
An introduction to variable and feature selection
The Journal of Machine Learning Research
Improving accuracy in word class tagging through the combination of machine learning systems
Computational Linguistics
A stochastic parts program and noun phrase parser for unrestricted text
ANLC '88 Proceedings of the second conference on Applied natural language processing
Identifying hierarchical structure in sequences: a linear-time algorithm
Journal of Artificial Intelligence Research
VOGUE: a novel variable order-gap state machine for modeling sequences
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
VOGUE: A variable order hidden Markov model with duration based on frequent sequence mining
ACM Transactions on Knowledge Discovery from Data (TKDD)
Hi-index | 0.00 |
We present a text mining approach that enables an extension of a standard authorship assessment problem (the problem in which an author of a text needs to be established) to role identification in communications within some Internet community. More precisely, we want to recognize a group of authors communicating in a specific role within such a community rather than a single author. The challenge here is that the same author may participate in different roles in communications within the group, in each role having different authors as peers. An additional challenge of our problem is the length of communications. Each individual exchange in our intended domain, communications within an Internet community, is relatively short, in the order of several dozens of words, so standard text mining approaches may fail. An example of such a problem is recognizing roles in a collection of emails from an organization in which middle level managers communicate both with superiors and subordinates. To validate our approach we use the Enron email dataset which is such a collection. Our approach is based on discovering patterns at varying degrees of abstraction in a hierarchical fashion. Such discovery process allows for certain degree of approximation in matching patterns, which is necessary for capturing non-trivial structures in realistic datasets. The discovered patterns are used as features to build efficient classifiers. Due to the nature of the pattern discovery process, we call our approach Recursive Data Mining. The results show that a classifier that uses the dominant patterns discovered by Recursive Data Mining performs well in role detection.