Machine Learning
Using analytic QP and sparseness to speed training of support vector machines
Proceedings of the 1998 conference on Advances in neural information processing systems II
Mining e-mail content for author identification forensics
ACM SIGMOD Record
The Functionality Attribute of Cybergenres
HICSS '99 Proceedings of the Thirty-Second Annual Hawaii International Conference on System Sciences-Volume 2 - Volume 2
The myth of the double-blind review?: author identification using only citations
ACM SIGKDD Explorations Newsletter
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Language and the Internet
Genre identification and goal-focused summarization
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Hi-index | 0.00 |
In this paper, we describe experiments conducted on identifying a person using a novel unique correlated corpus of text and audio samples of the person's communication in six genres. The text samples include essays, emails, blogs, and chat. Audio samples were collected from individual interviews and group discussions and then transcribed to text. For each genre, samples were collected for six topics. We show that we can identify the communicant with an accuracy of 71% for six fold cross validation using an average of 22,000 words per individual across the six genres. For person identification in a particular genre (train on five genres, test on one), an average accuracy of 82% is achieved. For identification from topics (train on five topics, test on one), an average accuracy of 94% is achieved. We also report results on identifying a person's communication in a genre using text genres only as well as audio genres only.