Overview of the second text retrieval conference (TREC-2)
TREC-2 Proceedings of the second conference on Text retrieval conference
Exploring the similarity space
ACM SIGIR Forum
A probabilistic model of information retrieval: development and comparative experiments
Information Processing and Management: an International Journal
A study of thresholding strategies for text categorization
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Modern Information Retrieval
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
ACM Transactions on Asian Language Information Processing (TALIP)
Authorship Attribution with Support Vector Machines
Applied Intelligence
Language Modeling for Information Retrieval
Language Modeling for Information Retrieval
Distributional word clusters vs. words for text categorization
The Journal of Machine Learning Research
An empirical study of smoothing techniques for language modeling
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
A study of smoothing methods for language models applied to information retrieval
ACM Transactions on Information Systems (TOIS)
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
Corpus structure, language models, and ad hoc information retrieval
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Authorship verification as a one-class classification problem
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Inverted files for text search engines
ACM Computing Surveys (CSUR)
Searching with style: authorship attribution in classic literature
ACSC '07 Proceedings of the thirtieth Australasian conference on Computer science - Volume 62
Effective and scalable authorship attribution using function words
AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
Using relative entropy for authorship attribution
AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Authorship Attribution Based on Specific Vocabulary
ACM Transactions on Information Systems (TOIS)
Authorship attribution based on a probabilistic topic model
Information Processing and Management: an International Journal
Feature selections for authorship attribution
Proceedings of the 28th Annual ACM Symposium on Applied Computing
Hi-index | 0.00 |
The purpose of authorship search is to identify documents written by a particular author in large document collections. Standard search engines match documents to queries based on topic, and are not applicable to authorship search. In this paper we propose an approach to authorship search based on information theory. We propose relative entropy of style markers for ranking, inspired by the language models used in information retrieval. Our experiments on collections of newswire texts show that, with simple style markers and sufficient training data, documents by a particular author can be accurately found from within large collections. Although effectiveness does degrade as collection size is increased, with even 500,000 documents nearly half of the top-ranked documents are correct matches. We have also found that the authorship search approach can be used for authorship attribution, and is much more scalable than state-of-art approaches in terms of the collection size and the number of candidate authors.