An investigation of Morton's method to distinguish Elizabethan playwrights
Computers and the Humanities
A text-independent speaker recognition system based on vowel spotting
Speech Communication
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Computer Methods for Literary Research
Computer Methods for Literary Research
An Empirical Text Categorizing Computational Model Based on Stylistic Aspects
ICTAI '96 Proceedings of the 8th International Conference on Tools with Artificial Intelligence
Using register-diversified corpora for general language studies
Computational Linguistics - Special issue on using large corpora: II
Robust text processing in automated information retrieval
ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Automatic detection of text genre
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Recognizing text genres with simple metrics using discriminant analysis
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Using Multivariate Statistics (5th Edition)
Using Multivariate Statistics (5th Edition)
Style mining of electronic messages for multiple authorship discrimination: first results
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Augmenting Naive Bayes Classifiers with Statistical Language Models
Information Retrieval
Information Processing and Management: an International Journal
Music artist style identification by semi-supervised learning from both lyrics and content
Proceedings of the 12th annual ACM international conference on Multimedia
On combining multiple clusterings
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Language independent authorship attribution using character level language models
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Language and task independent text categorization with simple language models
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Multiple sets of features for automatic genre classification of web documents
Information Processing and Management: an International Journal
Segmenting documents by stylistic character
Natural Language Engineering
From fingerprint to writeprint
Communications of the ACM - Supporting exploratory search
Effective identification of source code authors using byte-level information
Proceedings of the 28th international conference on Software engineering
Towards practical genre classification of web documents
Proceedings of the 15th international conference on World Wide Web
Extracting key-substring-group features for text classification
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
A quantitative analysis of lexical differences between genders in telephone conversations
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Linguistic correlates of style: authorship classification with deep linguistic analysis features
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Stylistic text classification using functional lexical features: Research Articles
Journal of the American Society for Information Science and Technology
ACM Transactions on Information Systems (TOIS)
Examining the significance of high-level programming features in source code author classification
Journal of Systems and Software
Author identification: Using text sampling to handle the class imbalance problem
Information Processing and Management: an International Journal
Proceedings of the 2008 ACM symposium on Applied computing
Chat mining: Predicting user and message attributes in computer-mediated communication
Information Processing and Management: an International Journal
Tensor Space Models for Authorship Identification
SETN '08 Proceedings of the 5th Hellenic conference on Artificial Intelligence: Theories, Models and Applications
A survey of modern authorship attribution methods
Journal of the American Society for Information Science and Technology
Text Sampling and Re-sampling for Imbalanced Authorship Identification Cases
Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
A Genre-Aware Approach to Focused Crawling
World Wide Web
Multiple sets of features for automatic genre classification of web documents
Information Processing and Management: an International Journal
Combining naive bayes and n-gram language models for text classification
ECIR'03 Proceedings of the 25th European conference on IR research
Large scale unstructured document classification using unlabeled data and syntactic information
PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
A classifier system for author recognition using synonym-based features
MICAI'07 Proceedings of the artificial intelligence 6th Mexican international conference on Advances in artificial intelligence
Automatic genre classification by using co-training
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1
On combining multiple clusterings: an overview and a new perspective
Applied Intelligence
Inferring gender of movie reviewers: exploiting writing style, content and metadata
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Towards style transformation from written-style to audio-style
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Automatic genre detection of web documents
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Content-based mobile spam classification using stylistically motivated features
Pattern Recognition Letters
N-Gram feature selection for authorship identification
AIMSA'06 Proceedings of the 12th international conference on Artificial Intelligence: methodology, Systems, and Applications
A computer-assisted qualitative data analysis framework for the engineering management domain
International Journal of Data Analysis Techniques and Strategies
Automatic turkish text categorization in terms of author, genre and gender
NLDB'06 Proceedings of the 11th international conference on Applications of Natural Language to Information Systems
Authorship Attribution Based on Specific Vocabulary
ACM Transactions on Information Systems (TOIS)
Distinguishing venues by writing styles
Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Representation models for text classification: a comparative analysis over three web document types
Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics
Implicit group membership detection in online text: analysis and applications
SBP'12 Proceedings of the 5th international conference on Social Computing, Behavioral-Cultural Modeling and Prediction
Mining writeprints from anonymous e-mails for forensic investigation
Digital Investigation: The International Journal of Digital Forensics & Incident Response
Plag-Inn: intrinsic plagiarism detection using grammar trees
NLDB'12 Proceedings of the 17th international conference on Applications of Natural Language Processing and Information Systems
Cross-lingual genre classification
EACL '12 Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics
Stylometric analysis of scientific articles
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Characterizing stylistic elements in syntactic structure
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Recognition of word collocation habits using frequency rank ratio and inter-term intimacy
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
The two main factors that characterize a text are its content and its style, and both can be used as a means of categorization. In this paper we present an approach to text categorization in terms of genre and author for Modern Greek. In contrast to previous stylometric approaches, we attempt to take full advantage of existing natural language processing (NLP) tools. To this end, we propose a set of style markers including analysis-level measures that represent the way in which the input text has been analyzed and capture useful stylistic information without additional cost. We present a set of small-scale but reasonable experiments in text genre detection, author identification, and author verification tasks and show that the proposed method performs better than the most popular distributional lexical measures, i.e., functions of vocabulary richness and frequencies of occurrence of the most frequent words. All the presented experiments are based on unrestricted text downloaded from the World Wide Web without any manual text preprocessing or text sampling. Various performance issues regarding the training set size and the significance of the proposed style markers are discussed. Our system can be used in any application that requires fast and easily adaptable text categorization in terms of stylistically homogeneous categories. Moreover, the procedure of defining analysis-level markers can be followed in order to extract useful stylistic information using existing text processing tools.