ACM Computing Surveys (CSUR)
The reuse of uses in Smalltalk programming
ACM Transactions on Computer-Human Interaction (TOCHI)
Support vector machines: hype or hallelujah?
ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Applying information-retrieval methods to software reuse: a case study
Information Processing and Management: an International Journal
What's the code?: automatic classification of source code archives
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
Supervised categorization of JavaScriptTM using program analysis features
Information Processing and Management: an International Journal - Special issue: AIRS2005: Information retrieval research in Asia
Effectively Searching Maps in Web Documents
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Hi-index | 0.00 |
The World Wide Web contains a number of source code archives. Programs are usually classified into various categories within the archive by hand. We report on experiments for automatic classification of source code into these categories. We examined a number of factors that affect classification accuracy. Weighting features by expected entropy loss makes a significant improvement in classification accuracy. We show a Support Vector Machine can be trained to classify source code with a high degree of accuracy. We feel these results show promise for software reuse.