Compression of inverted indexes For fast query evaluation
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
What's the code?: automatic classification of source code archives
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Shallow parsing with conditional random fields
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Finding question-answer pairs from online forums
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Automatically assessing the post quality in online discussions on software
ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
Towards identifying unresolved discussions in student online forums
IUNLPBEA '10 Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications
Predicting learner's project performance with dialogue features in online q&a discussions
ITS'12 Proceedings of the 11th international conference on Intelligent Tutoring Systems
Say Anything: Using Textual Case-Based Reasoning to Enable Open-Domain Interactive Storytelling
ACM Transactions on Interactive Intelligent Systems (TiiS) - Special Issue on Common Sense for Interactive Systems
Hi-index | 0.00 |
In this paper, we introduce a new problem: automatically capturing programming content in online discussions. We expect solving this problem helps enhance visual presentation of programming forum content, qualitative analysis of forum contributions, and forum text preprocessing and normalization. We map this problem to a sequence learning problem and use Conditional Random Fields to solve it. We compare the performance with a word-feature based baseline and a nonsequence classification method (Naïve Bayes). The best results are produced by CRF method with an F1-Score as of 86.9%. Moreover, we demonstrate that the CRF classifier maintains a good accuracy across different domains; a model learned from a C++ forum performs almost as well on other programming language forums for Java and Python. As a demonstration of how captured information can be used, we provide an example of user profiling with programming content. In particular, we correlate the percentage of programming content in student answers to the student's course performance.