Using traits of web macro scripts to predict reuse

Authors:
Chris Scaffidi;Chris Bogart;Margaret Burnett;Allen Cypher;Brad Myers;Mary Shaw
Affiliations:
School of Electrical Engineering and Computer Science, Oregon State University, 1148 Kelley Engineering Center, Oregon State University, Corvallis, OR 97331-4501, USA;School of Electrical Engineering and Computer Science, Oregon State University, 1148 Kelley Engineering Center, Oregon State University, Corvallis, OR 97331-4501, USA;School of Electrical Engineering and Computer Science, Oregon State University, 1148 Kelley Engineering Center, Oregon State University, Corvallis, OR 97331-4501, USA;IBM Research-Almaden, USA;Carnegie Mellon University, USA;Carnegie Mellon University, USA
Venue:
Journal of Visual Languages and Computing
Year:
2010

Citing 27
Cited 1

Internet repositories for collaborative learning: supporting both students and teachers

CSCL '95 The first international conference on Computer support for collaborative learning
A Validation of Object-Oriented Design Metrics as Quality Indicators

IEEE Transactions on Software Engineering
Generalized behavior-based retrieval

ICSE '93 Proceedings of the 15th international conference on Software Engineering
Storing and Retrieving Software Components: A Refinement Based System

IEEE Transactions on Software Engineering
Metrics for targeting candidates for reuse: an experimental approach

SAC '95 Proceedings of the 1995 ACM symposium on Applied computing
Data mining library reuse patterns using generalized association rules

Proceedings of the 22nd international conference on Software engineering
Generating Accurate Rule Sets Without Global Optimization

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Hipikat: recommending pertinent software development artifacts

Proceedings of the 25th International Conference on Software Engineering
Supporting Reuse of Evolving Visual Code

VL '97 Proceedings of the 1997 IEEE Symposium on Visual Languages (VL '97)
Jungloid mining: helping to navigate the API jungle

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Mining Version Histories to Guide Software Changes

IEEE Transactions on Software Engineering
Estimating the Numbers of End Users and End User Programmers

VLHCC '05 Proceedings of the 2005 IEEE Symposium on Visual Languages and Human-Centric Computing
Rascal: A Recommender Agent for Agile Reuse

Artificial Intelligence Review
Architectural support for trust models in decentralized applications

Proceedings of the 28th international conference on Software engineering
Coupling and cohesion measures for evaluation of component reusability

Proceedings of the 2006 international workshop on Mining software repositories
Toward harnessing user feedback for machine learning

Proceedings of the 12th international conference on Intelligent user interfaces
Koala: capture, share, automate, personalize business processes on the web

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Data Mining Static Code Attributes to Learn Defect Predictors

IEEE Transactions on Software Engineering
Reusability Framework, Assessment, and Directions

IEEE Software
Problems with Precision: A Response to "Comments on 'Data Mining Static Code Attributes to Learn Defect Predictors'"

IEEE Transactions on Software Engineering
CoScripter: automating & sharing how-to knowledge in the enterprise

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction

Proceedings of the 30th international conference on Software engineering
MSR 2008 - 5th working conference on mining software repositories

Companion of the 30th international conference on Software engineering
4th international workshop on predictor models in SE (PROMISE 2008)

Companion of the 30th international conference on Software engineering
Comparing design and code metrics for software quality prediction

Proceedings of the 4th international workshop on Predictor models in software engineering
Implications of ceiling effects in defect predictors

Proceedings of the 4th international workshop on Predictor models in software engineering
End-user programming in the wild: A field study of CoScripter scripts

VLHCC '08 Proceedings of the 2008 IEEE Symposium on Visual Languages and Human-Centric Computing

Towards mining informal online data to guide component-reuse decisions

Proceedings of the 16th International ACM Sigsoft symposium on Component-based software engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

To help people find a code that they might want to reuse, repositories of end-user code typically sort scripts by number of downloads, ratings, or other information based on prior uses of the code. However, this information is unavailable when the code is new or when it has not yet been reused. Addressing this problem requires identifying reusable code based solely on information that exists when a script is created. To provide such a model for web macro scripts, we identified script traits that might plausibly predict reuse, then used IBM CoScripter repository logs to statistically test how well each corresponded to actual reuse. These tests confirmed that the traits generally did correspond to higher levels of reuse as anticipated. We then developed a machine learning model that uses these traits as features to predict reuse of macros. Evaluating this model on repository logs showed that its accuracy is comparable to that of existing machine learning models for predicting reuse-but with a much simpler structure. Sensitivity analysis revealed that our model is quite robust; its quality is greatly reduced only when parameters are set to such extreme values that the model becomes inordinately selective. Testing the model with individual traits revealed those that provided the best predictions on their own. Based on these results, we outline opportunities for using our model to improve repositories of end-user code.