Experiences with text mining large collections of unstructured systems development artifacts at jpl

Authors:
Dan Port;Allen Nikora;Jairus Hihn;LiGuo Huang
Affiliations:
University of Hawaii, Honolulu, HI, USA;California Institute of Technology, Pasadena, CA, USA;California Institute of Technology, Pasadena, CA, USA;Southern Methodist University, Dallas, TX, USA
Venue:
Proceedings of the 33rd International Conference on Software Engineering
Year:
2011

Citing 24
Cited 1

Software modeling and measurement: the Goal/Question/Metric paradigm

Software modeling and measurement: the Goal/Question/Metric paradigm
Safeware: system safety and computers

Safeware: system safety and computers
Data mining library reuse patterns using generalized association rules

Proceedings of the 22nd international conference on Software engineering
Bugs as deviant behavior: a general approach to inferring errors in systems code

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Software Risk Management: Principles and Practices

IEEE Software
The aspps System

JELIA '02 Proceedings of the European Conference on Logics in Artificial Intelligence
Using latent semantic analysis to identify similarities in source code to support program understanding

ICTAI '00 Proceedings of the 12th IEEE International Conference on Tools with Artificial Intelligence
Improving Requirements Tracing via Information Retrieval

RE '03 Proceedings of the 11th IEEE International Conference on Requirements Engineering
Empirical Analysis of Safety-Critical Anomalies During Operations

IEEE Transactions on Software Engineering
Helping Analysts Trace Requirements: An Objective Look

RE '04 Proceedings of the Requirements Engineering Conference, 12th IEEE International
DynaMine: finding common error patterns by mining software revision histories

Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
PR-Miner: automatically extracting implicit programming rules and detecting violations in large software code

Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
Classifying Requirements: Towards a More Rigorous Analysis of Natural-Language Specifications

ISSRE '05 Proceedings of the 16th IEEE International Symposium on Software Reliability Engineering
Advancing Candidate Link Generation for Requirements Tracing: The Study of Methods

IEEE Transactions on Software Engineering
Can LSI help Reconstructing Requirements Traceability in Design and Test?

CSMR '06 Proceedings of the Conference on Software Maintenance and Reengineering
Predicate-calculus-based logics for modeling and solving search problems

ACM Transactions on Computational Logic (TOCL)
Combining Probabilistic Ranking and Latent Semantic Indexing for Feature Identification

ICPC '06 Proceedings of the 14th IEEE International Conference on Program Comprehension
Can Information Retrieval Techniques Effectively Support Traceability Link Recovery?

ICPC '06 Proceedings of the 14th IEEE International Conference on Program Comprehension
Clustering support for automated tracing

Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering
Spreadsheets in Team X: Preserving Order in an Inherently Chaotic Environment

HICSS '09 Proceedings of the 42nd Hawaii International Conference on System Sciences
Improving the Accuracy of Space Mission Software Anomaly Frequency Estimates

SMC-IT '09 Proceedings of the Third IEEE International Conference on Space Mission Challenges for Information Technology
Automated Identification of LTL Patterns in Natural Language Requirements

ISSRE '09 Proceedings of the 2009 20th International Symposium on Software Reliability Engineering
Text mining in supporting software systems risk assurance

Proceedings of the IEEE/ACM international conference on Automated software engineering
Text Mining Support for Software Requirements: Traceability Assurance

HICSS '11 Proceedings of the 2011 44th Hawaii International Conference on System Sciences

Exploring techniques for rationale extraction from existing documents

Proceedings of the 34th International Conference on Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Often repositories of systems engineering artifacts at NASA's Jet Propulsion Laboratory (JPL) are so large and poorly structured that they have outgrown our capability to effectively manually process their contents to extract useful information. Sophisticated text mining methods and tools seem a quick, low-effort approach to automating our limited manual efforts. Our experiences of exploring such methods mainly in three areas including historical risk analysis, defect identification based on requirements analysis, and over-time analysis of system anomalies at JPL, have shown that obtaining useful results requires substantial unanticipated efforts - from preprocessing the data to transforming the output for practical applications. We have not observed any quick 'wins' or realized benefit from short-term effort avoidance through automation in this area. Surprisingly we have realized a number of unexpected long-term benefits from the process of applying text mining to our repositories. This paper elaborates some of these benefits and our important lessons learned from the process of preparing and applying text mining to large unstructured system artifacts at JPL aiming to benefit future TM applications in similar problem domains and also in hope for being extended to broader areas of applications.