Making large-scale support vector machine learning practical
Advances in kernel methods
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
An introduction to variable and feature selection
The Journal of Machine Learning Research
Improving accuracy in word class tagging through the combination of machine learning systems
Computational Linguistics
A stochastic parts program and noun phrase parser for unrestricted text
ANLC '88 Proceedings of the second conference on Applied natural language processing
Constructing ensembles of symbolic classifiers
International Journal of Hybrid Intelligent Systems - Hybrid Intelligent systems in Ensembles
Identifying hierarchical structure in sequences: a linear-time algorithm
Journal of Artificial Intelligence Research
VOGUE: A variable order hidden Markov model with duration based on frequent sequence mining
ACM Transactions on Knowledge Discovery from Data (TKDD)
Combining naive bayes and n-gram language models for text classification
ECIR'03 Proceedings of the 25th European conference on IR research
Feature selection for paintings classification by optimal tree pruning
MRCS'06 Proceedings of the 2006 international conference on Multimedia Content Representation, Classification and Security
Hi-index | 0.00 |
We present a text mining approach that discovers patterns at varying degrees of abstraction in a hierarchical fashion. The approach allows for certain degree of approximation in matching patterns, which is necessary to capture non-trivial features in realistic datasets. Due to its nature, we call this approach Recursive Data Mining (RDM). We demonstrate a novel application of RDM to role identification in electronic communications. We use a hybrid approach in which the RDM discovered patterns are used as features to build efficient classifiers. Since we want to recognize a group of authors communicating in a specific role within an Internet community, the challenge is to recognize possibly different roles of an author within different communication communities. Moreover, each individual exchange in electronic communications is typically short, making the standard text mining approaches less efficient than in other applications. An example of such a problem is recognizing roles in a collection of emails from an organization in which middle level managers communicate both with superiors and subordinates. To validate our approach we use the Enron dataset which is such a collection. The results show that a classifier that uses the dominant patterns discovered by Recursive Data Mining performs well in role identification.