Learning internal representations by error propagation
Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
"Uni cheats racket": a case study in plagiarism investigation
ACE '04 Proceedings of the Sixth Australasian Conference on Computing Education - Volume 30
GPLAG: detection of software plagiarism by program dependence graph analysis
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Self-taught learning: transfer learning from unlabeled data
Proceedings of the 24th international conference on Machine learning
Proceedings of the 9th annual conference on Genetic and evolutionary computation
Detecting outsourced student programming assignments
Journal of Computing Sciences in Colleges
On the Use of Discretized Source Code Metrics for Author Identification
SSBSE '09 Proceedings of the 2009 1st International Symposium on Search Based Software Engineering
Learning Deep Architectures for AI
Foundations and Trends® in Machine Learning
On the expressive power of deep architectures
DS'11 Proceedings of the 14th international conference on Discovery science
Text Detection and Character Recognition in Scene Images with Unsupervised Feature Learning
ICDAR '11 Proceedings of the 2011 International Conference on Document Analysis and Recognition
Random search for hyper-parameter optimization
The Journal of Machine Learning Research
Hi-index | 0.10 |
Automatic identification of source code authors has many applications in different fields such as source code plagiarism detection, and law suit prosecution. This paper presents a new source code author identification system based on an unsupervised feature learning technique. As a method of extracting features from high dimensional data, unsupervised feature learning has obtained a great success in many fields such as character recognition and image classification. However, according to our knowledge it has not been applied for source code author identification systems. Therefore, we investigated an unsupervised feature learning technique called sparse auto-encoder as a method of extracting features from source code files. Our system was evaluated with several datasets and results have shown that performance is very close to the state of art techniques in the source code identification field.