Static test case prioritization using topic models

Authors:
Stephen W. Thomas;Hadi Hemmati;Ahmed E. Hassan;Dorothea Blostein
Affiliations:
School of Computing, Queen's University, Kingston, Canada;School of Computing, Queen's University, Kingston, Canada;School of Computing, Queen's University, Kingston, Canada;School of Computing, Queen's University, Kingston, Canada
Venue:
Empirical Software Engineering
Year:
2014

Citing 46
Cited 0

Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Prioritizing Test Cases For Regression Testing

IEEE Transactions on Software Engineering
Test Case Prioritization: A Family of Empirical Studies

IEEE Transactions on Software Engineering
Test-Suite Reduction and Prioritization for Modified Condition/Decision Coverage

IEEE Transactions on Software Engineering
A Study of Effective Regression Testing in Practice

ISSRE '97 Proceedings of the Eighth International Symposium on Software Reliability Engineering
Latent dirichlet allocation

The Journal of Machine Learning Research
A Comparison of Coverage-Based and Distribution-Based Techniques for Filtering and Prioritizing Test Cases

ISSRE '03 Proceedings of the 14th International Symposium on Software Reliability Engineering
An Information Retrieval Approach to Concept Location in Source Code

WCRE '04 Proceedings of the 11th Working Conference on Reverse Engineering
Supporting Controlled Experimentation with Testing Techniques: An Infrastructure and its Potential Impact

Empirical Software Engineering
The TXL source transformation language

Science of Computer Programming - The fourth workshop on language descriptions, tools, and applications (LDTA'04)
A Technique to Reduce the Test Case Suites for Regression Testing Based on a Self-Organizing Neural Network Architecture

COMPSAC '06 Proceedings of the 30th Annual International Computer Software and Applications Conference - Volume 02
Semantic clustering: Identifying topics in source code

Information and Software Technology
An Empirical Study of Test Case Filtering Techniques Based on Exercising Information Flows

IEEE Transactions on Software Engineering
Model-based test prioritization heuristic methods and their evaluation

Proceedings of the 3rd international workshop on Advances in model-based testing
Call-Stack Coverage for GUI Test Suite Reduction

IEEE Transactions on Software Engineering
Mining business topics in source code using latent dirichlet allocation

ISEC '08 Proceedings of the 1st India software engineering conference
PHALANX: a graph-theoretic framework for test case prioritization

Proceedings of the 2008 ACM symposium on Applied computing
Prioritizing User-Session-Based Test Cases for Web Applications Testing

ICST '08 Proceedings of the 2008 International Conference on Software Testing, Verification, and Validation
Fast collapsed gibbs sampling for latent dirichlet allocation

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Searching for Cognitively Diverse Tests: Towards Universal Test Diversity Metrics

ICSTW '08 Proceedings of the 2008 IEEE International Conference on Software Testing Verification and Validation Workshop
A theory of aspects as latent topics

Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications
An Application of Latent Dirichlet Allocation to Analyzing Software Evolution

ICMLA '08 Proceedings of the 2008 Seventh International Conference on Machine Learning and Applications
Evaluation methods for topic models

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Clustering test cases to achieve effective and scalable prioritisation incorporating expert knowledge

Proceedings of the eighteenth international symposium on Software testing and analysis
Adaptive Random Test Case Prioritization

ASE '09 Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering
Using String Distances for Test Case Prioritisation

ASE '09 Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering
Software traceability with topic modeling

Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
Bug localization using latent Dirichlet allocation

Information and Software Technology
An enhanced test case selection approach for model-based testing: an industrial case study

Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
Validating the Use of Topic Models for Software Evolution

SCAM '10 Proceedings of the 2010 10th IEEE Working Conference on Source Code Analysis and Manipulation
Estimating the Optimal Number of Latent Concepts in Source Code Analysis

SCAM '10 Proceedings of the 2010 10th IEEE Working Conference on Source Code Analysis and Manipulation
Using Relational Topic Models to capture coupling among classes in object-oriented software systems

ICSM '10 Proceedings of the 2010 IEEE International Conference on Software Maintenance
TopicXP: Exploring topics in source code using Latent Dirichlet Allocation

ICSM '10 Proceedings of the 2010 IEEE International Conference on Software Maintenance
A Systematic Review of the Application and Empirical Investigation of Search-Based Test Case Generation

IEEE Transactions on Software Engineering
Reducing the cost of model-based testing through test case diversity

ICTSS'10 Proceedings of the 22nd IFIP WG 6.1 international conference on Testing software and systems
Modeling the evolution of topics in source code histories

Proceedings of the 8th Working Conference on Mining Software Repositories
A practical guide for using statistical tests to assess randomized algorithms in software engineering

Proceedings of the 33rd International Conference on Software Engineering
Identifying method friendships to remove the feature envy bad smell (NIER track)

Proceedings of the 33rd International Conference on Software Engineering
Using Semi-supervised Clustering to Improve Regression Test Selection Techniques

ICST '11 Proceedings of the 2011 Fourth IEEE International Conference on Software Testing, Verification and Validation
Empirical Investigation of the Effects of Test Suite Properties on Similarity-Based Test Case Selection

ICST '11 Proceedings of the 2011 Fourth IEEE International Conference on Software Testing, Verification and Validation
Concern Localization using Information Retrieval: An Empirical Study on Linux Kernel

WCRE '11 Proceedings of the 2011 18th Working Conference on Reverse Engineering
On integrating orthogonal information retrieval methods to improve traceability recovery

ICSM '11 Proceedings of the 2011 27th IEEE International Conference on Software Maintenance
Prioritizing test cases with string distances

Automated Software Engineering
Regression testing minimization, selection and prioritization: a survey

Software Testing, Verification & Reliability
A Static Approach to Prioritizing JUnit Test Cases

IEEE Transactions on Software Engineering
Achieving scalable model-based testing through test case diversity

ACM Transactions on Software Engineering and Methodology (TOSEM)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Software development teams use test suites to test changes to their source code. In many situations, the test suites are so large that executing every test for every source code change is infeasible, due to time and resource constraints. Development teams need to prioritize their test suite so that as many distinct faults as possible are detected early in the execution of the test suite. We consider the problem of static black-box test case prioritization (TCP), where test suites are prioritized without the availability of the source code of the system under test (SUT). We propose a new static black-box TCP technique that represents test cases using a previously unused data source in the test suite: the linguistic data of the test cases, i.e., their identifier names, comments, and string literals. Our technique applies a text analysis algorithm called topic modeling to the linguistic data to approximate the functionality of each test case, allowing our technique to give high priority to test cases that test different functionalities of the SUT. We compare our proposed technique with existing static black-box TCP techniques in a case study of multiple real-world open source systems: several versions of Apache Ant and Apache Derby. We find that our static black-box TCP technique outperforms existing static black-box TCP techniques, and has comparable or better performance than two existing execution-based TCP techniques. Static black-box TCP methods are widely applicable because the only input they require is the source code of the test cases themselves. This contrasts with other TCP techniques which require access to the SUT runtime behavior, to the SUT specification models, or to the SUT source code.