ACM Computing Surveys (CSUR)
Information access tools for software reuse
Journal of Systems and Software - Special issue on software reuse
The reuse of uses in Smalltalk programming
ACM Transactions on Computer-Human Interaction (TOCHI)
Inductive learning algorithms and representations for text categorization
Proceedings of the seventh international conference on Information and knowledge management
Viewing morphology as an inference process
Artificial Intelligence - Special issue on Intelligent internet systems
Support vector machines: hype or hallelujah?
ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Improving Category Specific Web Search by Learning Query Modifications
SAINT '01 Proceedings of the 2001 Symposium on Applications and the Internet (SAINT 2001)
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
Classification of source code archives
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Organizing and visualizing software repositories using the growing hierarchical self-organizing map
Proceedings of the 2005 ACM symposium on Applied computing
MUDABlue: an automatic categorization system for open source repositories
Journal of Systems and Software - Special issue: Selected papers from the 11th Asia Pacific software engineering conference (APSEC 2004)
Supervised categorization of JavaScriptTM using program analysis features
Information Processing and Management: an International Journal - Special issue: AIRS2005: Information retrieval research in Asia
Mining concepts from code with probabilistic topic models
Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering
Mining business topics in source code using latent dirichlet allocation
ISEC '08 Proceedings of the 1st India software engineering conference
A theory of aspects as latent topics
Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications
Sourcerer: mining and searching internet-scale software repositories
Data Mining and Knowledge Discovery
Classification of software artifacts based on structural information
KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part IV
Source code indexing for automated tracing
Proceedings of the 6th International Workshop on Traceability in Emerging Forms of Software Engineering
Approximate graph clustering for program characterization
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Supporting program indexing and querying in source code digital libraries
AOIS'05 Proceedings of the 7th international conference on Agent-Oriented Information Systems III
Supervised categorization of JavaScript™ using program analysis features
AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
Labeled topic detection of open source software from mining mass textual project profiles
Proceedings of the First International Workshop on Software Mining
Capturing programming content in online discussions
Proceedings of the seventh international conference on Knowledge capture
Hi-index | 0.00 |
There are various source code archives on the World Wide Web. These archives are usually organized by application categories and programming languages. However, manually organizing source code repositories is not a trivial task since they grow rapidly and are very large (on the order of terabytes). We demonstrate machine learning methods for automatic classification of archived source code into eleven application topics and ten programming languages. For topical classification, we concentrate on C and C++ programs from the Ibiblio and the Sourceforge archives. Support vector machine (SVM) classifiers are trained on examples of a given programming language or programs in a specified category. We show that source code can be accurately and automatically classified into topical categories and can be identified to be in a specific programming language class.