Copy detection mechanisms for digital documents
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Building a scalable and accurate copy detection mechanism
Proceedings of the first ACM international conference on Digital libraries
Exploring the similarity space
ACM SIGIR Forum
A language modeling approach to information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
CHECK: a document plagiarism detection system
SAC '97 Proceedings of the 1997 ACM symposium on Applied computing
MatchDetectReveal: finding overlapping and similar digital documents
Proceedings of the 2000 information resources management association international conference on Challenges of information technology management in the 21st century
Latent semantic indexing: a probabilistic analysis
Journal of Computer and System Sciences - Special issue on the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems
Modern Information Retrieval
A Tutorial on Support Vector Machines for Pattern Recognition
Data Mining and Knowledge Discovery
An Efficient File Structure for Document Retrieval in the Automated Office Environment
IEEE Transactions on Knowledge and Data Engineering
Generating, integrating, and activating thesauri for concept-based document retrieval
IEEE Expert: Intelligent Systems and Their Applications
LSISOM – A Latent Semantic Indexing Approach to Self-Organizing Maps of Document Collections
Neural Processing Letters
Marginal median SOM for document organization and retrieval
Neural Networks
Discriminative models for information retrieval
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
SNITCH: a software tool for detecting cut and paste plagiarism
Proceedings of the 37th SIGCSE technical symposium on Computer science education
Content-based image retrieval by using tree-structured features and multi-layer self-organizing map
Pattern Analysis & Applications
A scaleable document clustering approach for large document corpora
Information Processing and Management: an International Journal
Document Classification Based on Support Vector Machine Using a Concept Vector Model
WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
SVM-based interactive document retrieval with active learning
New Generation Computing
PPChecker: plagiarism pattern checker in document copy detection
TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue
Intrinsic plagiarism detection
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Narrowing the semantic gap - improved text-based web document retrieval using visual features
IEEE Transactions on Multimedia
Document retrieval using fuzzy-valued concept networks
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
IEEE Transactions on Fuzzy Systems
IEEE Transactions on Image Processing
Web content management by self-organization
IEEE Transactions on Neural Networks
A new dual wing harmonium model for document retrieval
Pattern Recognition
A novel dual wing harmonium model aided by 2-D wavelet transform subbands for document data mining
Expert Systems with Applications: An International Journal
A coarse-to-fine framework to efficiently thwart plagiarism
Pattern Recognition
Semantics-based representation model for multi-layer text classification
KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part II
The study of plagiarism detection for object-oriented programming language
ICCCI'10 Proceedings of the Second international conference on Computational collective intelligence: technologies and applications - Volume Part III
Outlier-based approaches for intrinsic and external plagiarism detection
KES'11 Proceedings of the 15th international conference on Knowledge-based and intelligent information and engineering systems - Volume Part II
A multi-layer text classification framework based on two-level representation model
Expert Systems with Applications: An International Journal
A multi-level matching method with hybrid similarity for document retrieval
Expert Systems with Applications: An International Journal
An improved plagiarism detection scheme based on semantic role labeling
Applied Soft Computing
Self-organizing map for symbolic data
Fuzzy Sets and Systems
Expert Systems with Applications: An International Journal
Hi-index | 0.01 |
This paper proposes a new document retrieval (DR) and plagiarism detection (PD) system using multilayer self-organizing map (MLSOM). A document is modeled by a rich tree-structured representation, and a SOM-based system is used as a computationally effective solution. Instead of relying on keywords/lines, the proposed scheme compares a full document as a query for performing retrieval and PD. The tree-structured representation hierarchically includes document features as document, pages, and paragraphs. Thus, it can reflect underlying context that is difficult to acquire from the currently used word-frequency information. We show that the tree-structured data is effective for DR and PD. To handle tree-structured representation in an efficient way, we use an MLSOM algorithm, which was previously developed by the authors for the application of image retrieval. In this study, it serves as an effective clustering algorithm. Using the MLSOM, local matching techniques are developed for comparing text documents. Two novel MLSOM-based PD methods are proposed. Detailed simulations are conducted and the experimental results corroborate that the proposed approach is computationally efficient and accurate for DR and PD.