A maximum entropy approach to natural language processing
Computational Linguistics
Factorial Hidden Markov Models
Machine Learning - Special issue on learning with probabilistic representations
Qualitative Methods in Empirical Studies of Software Engineering
IEEE Transactions on Software Engineering
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Machine Learning
Recovering Traceability Links between Code and Documentation
IEEE Transactions on Software Engineering
Generating Robust Parsers using Island Grammars
WCRE '01 Proceedings of the Eighth Working Conference on Reverse Engineering (WCRE'01)
Island parsing and bidirectional charts
COLING '88 Proceedings of the 12th conference on Computational linguistics - Volume 2
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Modeling history to analyze software evolution: Research Articles
Journal of Software Maintenance and Evolution: Research and Practice
Textual Allusions to Artifacts in Software-Related Repositories
Proceedings of the 2006 international workshop on Mining software repositories
An empirical comparison of supervised learning algorithms
ICML '06 Proceedings of the 23rd international conference on Machine learning
Speech and Language Processing (2nd Edition)
Speech and Language Processing (2nd Edition)
Semantic clustering: Identifying topics in source code
Information and Software Technology
Detecting Patch Submission and Acceptance in OSS Projects
MSR '07 Proceedings of the Fourth International Workshop on Mining Software Repositories
Automatic summarising: The state of the art
Information Processing and Management: an International Journal
Extracting structural information from bug reports
Proceedings of the 2008 international working conference on Mining software repositories
Introduction to Information Retrieval
Introduction to Information Retrieval
Fair and balanced?: bias in bug-fix datasets
Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Linking e-mails and source code artifacts
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
Summarizing software artifacts: a case study of bug reports
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
Supporting program comprehension with source code summarization
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 2
Extracting Source Code from E-Mails
ICPC '10 Proceedings of the 2010 IEEE 18th International Conference on Program Comprehension
IEEE Transactions on Software Engineering
A Case Study of Bias in Bug-Fix Datasets
WCRE '10 Proceedings of the 2010 17th Working Conference on Reverse Engineering
RTFM (Read the Factual Mails) - Augmenting Program Comprehension with Remail
CSMR '11 Proceedings of the 2011 15th European Conference on Software Maintenance and Reengineering
Non-essential changes in version histories
Proceedings of the 33rd International Conference on Software Engineering
Extracting structured data from natural language documents with island parsing
ASE '11 Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering
Evaluating defect prediction approaches: a benchmark and an extensive comparison
Empirical Software Engineering
Detecting API documentation errors
Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
Hi-index | 0.00 |
Emails related to the development of a software system contain information about design choices and issues encountered during the development process. Exploiting the knowledge embedded in emails with automatic tools is challenging, due to the unstructured, noisy, and mixed language nature of this communication medium. Natural language text is often not well-formed and is interleaved with languages with other syntaxes, such as code or stack traces. We present an approach to classify email content at line level. Our technique classifies email lines in five categories (i.e., text, junk, code, patch, and stack trace) to allow one to subsequently apply ad hoc analysis techniques for each category. We evaluated our approach on a statistically significant set of emails gathered from mailing lists of four unrelated open source systems.