A class and method taxonomy for object-oriented programs

Authors:
David A. Workman
Affiliations:
University of Central Florida
Venue:
ACM SIGSOFT Software Engineering Notes
Year:
2002

Citing 9
Cited 0

An empirical approach for detecting program similarity and plagiarism within a university programming environment

Computers & Education
Detecting plagiarism in student Pascal programs

The Computer Journal
Identification of program similarity in large populations

The Computer Journal - Special issue on procedural programming
Software metrics and plagiarism detection

Journal of Systems and Software - Special issue on using software metrics
The unified software development process

The unified software development process
An algorithmic approach to the detection and prevention of plagiarism

ACM SIGCSE Bulletin
Elements of Software Science (Operating and programming systems series)

Elements of Software Science (Operating and programming systems series)
A tool that detects plagiarism in Pascal programs

SIGCSE '81 Proceedings of the twelfth SIGCSE technical symposium on Computer science education
A plagiarism detection system

SIGCSE '81 Proceedings of the twelfth SIGCSE technical symposium on Computer science education

Quantified Score

Hi-index	0.00

Visualization

Abstract

The object-oriented approach to software design together withthe programming languages (C++, Java, and Ada95) and designnotations (e.g. UML) that support this paradigm, have precipitatednew interest in developing and tailoring software metrics to moreeffectively quantify properties of OO systems. To be specific, thisresearch on OO software is motivated by two related problems.1) In many computer science courses instructors are torn betweentwo conflicting goals:(a) increasing the number and difficulty of programmingassignments to raise students' problem solving skills and maturity,while on the other hand,(b) giving meaningful feedback on the correctness and quality ofprograms they write.To address this problem, we are developing an automated Javaprogram grading system. This system will compare student programsto an oracle program prepared by the instructor for a givenassignment. The oracle program represents the "ideal" solution. Inaddition to computing a quantitative score for a student program,the grading program will also provide feedback on modifications orchanges the student could or should make to improve the quality ofthe design of his or her solution.2)A problem that is all too common in the computing industry issoftware theft. This has led to much copyright infringementlitigation within our court system. As an expert witness in suchcases, one of the tasks I have been frequently asked to perform isevaluate two programs to determine the nature and extent of theirsimilarity. A tool, such as our planned program grading system, isneeded to facilitate the kind of analysis required in suchcases.In the academic world, the equivalent to software theft isplagiarism. Therefore, as an application complementary to programgrading, our proposed system will also serve as a tool foridentifying "cheaters" by comparing two student programs to oneanother, rather than to the oracle.So, in summary, our goal is to develop the key algorithms andeventually a program analysis system that will effectivelydetermine the similarity of two programs written in the samelanguage. Since Java is becoming one of the most widely usedprogramming languages, and because of its relatively "clean" syntaxand semantics, Java will provide the focus for our initialinvestigation.Java programs are composed of three essential building blocks:packages, classes, and methods. Methods are the functional orprocedural units that perform or realize the algorithms necessaryto solve a computational problem. Methods are grouped withencapsulated data to define classes -new types that extend Java'sset of primitive types. Finally, classes are organized intosubsystems or libraries using packages.Thus, when comparing two Java programs to determine theirsimilarity, we must establish a correspondence between thepackages, classes, and methods of the two programs underconsideration. This suggests we must ascertain for a given pair ofunits, one from each program whether or not they are sufficientlysimilar to warrant being identified as "matching" in ourcorrespondence analysis. To be similar, they must be "doing theessentially the same thing" -that is, they must both serve the samecomputational purpose.Assuming we are successful in developing some technique fordetermining similarity of purpose, we are still faced with thepotentially large numbers of unit-pairs that must be considered inour analysis. The sheer magnitude of our computational problem thuslooms as a major obstacle to obtaining any real practical solution.Using the names of units to limit what pairs need to be compared,while certainly reducing the potential computational load, is not avery reliable strategy --- particularly if the author of oneprogram has made a deliberate attempt to disguise similarity withanother program by uniformly changing names.Thus, in an attempt to address the computational load problemand the identification problem for comparison analysis, we plan tomake an initial pass over each program to categorize methods andclasses according to their purpose. The rationale for thisis: two units will be selected for detailed comparison analysisonly if they belong to of the same purpose category. The focus ofthis paper, therefore, is to present definitions and examples ofthe purpose categories for methods and classes. How these purposecategories will be used in a larger comparison strategy is beyondthe scope of this work. Refer to Lan[13] for further a morecomplete and detailed description of our methodology.