The effects of identifier retention and stop word removal on a latent Dirichlet allocation based feature location technique

Authors:
Lauren R. Biggers
Affiliations:
The University of Alabama, Tuscaloosa, AL
Venue:
Proceedings of the 50th Annual Southeast Regional Conference
Year:
2012

Citing 25
Cited 0

Lexical analysis and stoplists

Information retrieval
The maintenance problem of application software: an empirical analysis

Journal of Software Maintenance: Research and Practice
Reverse engineering: a roadmap

Proceedings of the Conference on The Future of Software Engineering
Software Engineering Economics

Software Engineering Economics
Latent dirichlet allocation

The Journal of Machine Learning Research
An Information Retrieval Approach to Concept Location in Source Code

WCRE '04 Proceedings of the 11th Working Conference on Reverse Engineering
Dynamic Feature Traces: Finding Features in Unfamiliar Code

ICSM '05 Proceedings of the 21st IEEE International Conference on Software Maintenance
SNIAFL: Towards a static noninteractive approach to feature location

ACM Transactions on Software Engineering and Methodology (TOSEM)
Fine grained indexing of software repositories to support impact analysis

Proceedings of the 2006 international workshop on Mining software repositories
Feature Location Using Probabilistic Ranking of Methods Based on Execution Scenarios and Information Retrieval

IEEE Transactions on Software Engineering
Recovering traceability links in software artifact management systems using information retrieval methods

ACM Transactions on Software Engineering and Methodology (TOSEM)
Exploring the neighborhood with dora to expedite software maintenance

Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering
Feature location via information retrieval based filtering of a single scenario execution trace

Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering
A Traceability Technique for Specifications

ICPC '08 Proceedings of the 2008 The 16th IEEE International Conference on Program Comprehension
Source Code Retrieval for Bug Localization Using Latent Dirichlet Allocation

WCRE '08 Proceedings of the 2008 15th Working Conference on Reverse Engineering
A theory of aspects as latent topics

Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications
Using information retrieval based coupling measures for impact analysis

Empirical Software Engineering
Using Latent Dirichlet Allocation for automatic categorization of software

MSR '09 Proceedings of the 2009 6th IEEE International Working Conference on Mining Software Repositories
Software traceability with topic modeling

Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
Bug localization using latent Dirichlet allocation

Information and Software Technology
Using Data Fusion and Web Mining to Support Feature Location in Software

ICPC '10 Proceedings of the 2010 IEEE 18th International Conference on Program Comprehension
On the Equivalence of Information Retrieval Methods for Automated Traceability Link Recovery

ICPC '10 Proceedings of the 2010 IEEE 18th International Conference on Program Comprehension
Software is data too

Proceedings of the FSE/SDP workshop on Future of software engineering research
Clustering Support for Static Concept Location in Source Code

ICPC '11 Proceedings of the 2011 IEEE 19th International Conference on Program Comprehension
Combining lexical and structural information for static bug localisation

International Journal of Computer Applications in Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Feature location, an important task in program comprehension, occurs when the developer identifies the source code entity or entities responsible for implementing a functionality. Researchers have applied static analysis techniques to multiple software maintenance tasks, including feature localization. Static analysis techniques operate on a document corpus. Configuration and preprocessing decisions are required to build a suitable source code corpus for a static analysis technique. Currently, there is little guidance in the software engineering literature for making such configuration decisions. This paper focuses on two preprocessing methods for source code corpora, identifier splitting and stop word lists. We experiment on three open source Java test suites, i.e. Mylyn 1.0.1, Rhino 1.5R5, and Rhino 1.6R5. Our results indicate that identifier splitting and stop word list decisions do not significantly affect the performance of the LDA based feature location technique.