Artificial intelligence: a modern approach
Artificial intelligence: a modern approach
Introduction to knowledge systems
Introduction to knowledge systems
Copy detection mechanisms for digital documents
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Two algorithms for nearest-neighbor search in high dimensions
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Signature extraction for overlap detection in documents
ACSC '02 Proceedings of the twenty-fifth Australasian conference on Computer science - Volume 4
Constructing Boosting Algorithms from SVMs: An Application to One-Class Classification
IEEE Transactions on Pattern Analysis and Machine Intelligence
Similarity Search in High Dimensions via Hashing
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Combining One-Class Classifiers
MCS '01 Proceedings of the Second International Workshop on Multiple Classifier Systems
Methods for identifying versioned and plagiarized documents
Journal of the American Society for Information Science and Technology
Topic segmentation: algorithms and applications
Topic segmentation: algorithms and applications
One-class svms for document classification
The Journal of Machine Learning Research
Style mining of electronic messages for multiple authorship discrimination: first results
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Advances in domain independent linear text segmentation
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Authorship verification as a one-class classification problem
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Segmenting documents by stylistic character
Natural Language Engineering
Journal of the American Society for Information Science and Technology
Finding near-duplicate web pages: a large-scale evaluation of algorithms
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Near-duplicate detection by instance-level constrained clustering
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Authorship attribution with thousands of candidate authors
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
On Authorship Attribution via Markov Chains and Sequence Kernels
ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 03
Author verification by linguistic profiling: An exploration of the parameter space
ACM Transactions on Speech and Language Processing (TSLP)
Linguistic profiling for author recognition and verification
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Obfuscating document stylometry to preserve author anonymity
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Principles of hash-based text retrieval
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Author Identification Using Imbalanced and Limited Training Texts
DEXA '07 Proceedings of the 18th International Conference on Database and Expert Systems Applications
Measuring Differentiability: Unmasking Pseudonymous Authors
The Journal of Machine Learning Research
Foundations and Trends in Information Retrieval
Meta Analysis within Authorship Verification
DEXA '08 Proceedings of the 2008 19th International Conference on Database and Expert Systems Application
Computational methods in authorship attribution
Journal of the American Society for Information Science and Technology
A survey of modern authorship attribution methods
Journal of the American Society for Information Science and Technology
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
Indexing shared content in information retrieval systems
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Intrinsic plagiarism detection
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Comparative evaluation of text- and citation-based plagiarism detection approaches using guttenplag
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Proceedings of the 11th ACM symposium on Document engineering
Detection of text quality flaws as a one-class classification problem
Proceedings of the 20th ACM international conference on Information and knowledge management
Detection of near-duplicate user generated contents: the SMS spam collection
Proceedings of the 3rd international workshop on Search and mining user-generated contents
Expert Systems with Applications: An International Journal
Explanation in computational stylometry
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
Detecting machine-morphed malware variants via engine attribution
Journal in Computer Virology
Plagiarism meets paraphrasing: Insights for the next generation in automatic plagiarism detection
Computational Linguistics
Hi-index | 0.00 |
Research in automatic text plagiarism detection focuses on algorithms that compare suspicious documents against a collection of reference documents. Recent approaches perform well in identifying copied or modified foreign sections, but they assume a closed world where a reference collection is given. This article investigates the question whether plagiarism can be detected by a computer program if no reference can be provided, e.g., if the foreign sections stem from a book that is not available in digital form. We call this problem class intrinsic plagiarism analysis; it is closely related to the problem of authorship verification. Our contributions are threefold. (1) We organize the algorithmic building blocks for intrinsic plagiarism analysis and authorship verification and survey the state of the art. (2) We show how the meta learning approach of Koppel and Schler, termed "unmasking", can be employed to post-process unreliable stylometric analysis results. (3) We operationalize and evaluate an analysis chain that combines document chunking, style model computation, one-class classification, and meta learning.