A weakly-supervised approach to argumentative zoning of scientific documents

Authors:
Yufan Guo;Anna Korhonen;Thierry Poibeau
Affiliations:
University of Cambridge, UK;University of Cambridge, UK;LaTTiCe, CNRS & ENS, France
Venue:
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Year:
2011

Citing 31
Cited 1

Query by committee

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
A sequential algorithm for training text classifiers

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Classification by pairwise coupling

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Approximate statistical tests for comparing supervised classification learning algorithms

Neural Computation
Using analytic QP and sparseness to speed training of support vector machines

Proceedings of the 1998 conference on Advances in neural information processing systems II
Summarizing scientific articles: experiments with relevance and rhetorical status

Computational Linguistics - Summarization
Active + Semi-supervised Learning = Robust Multi-View Learning

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Learning from Labeled and Unlabeled Documents: A Comparative Study on Semi-Supervised Text Classification

PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Active Hidden Markov Models for Information Extraction

IDA '01 Proceedings of the 4th International Conference on Advances in Intelligent Data Analysis
Employing EM and Pool-Based Active Learning for Text Classification

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Support vector machine active learning with applications to text classification

The Journal of Machine Learning Research
A baseline feature set for learning rhetorical zones using full articles in the biomedical domain

ACM SIGKDD Explorations Newsletter - Natural language processing and text mining
Large-scale text categorization by batch mode active learning

Proceedings of the 15th international conference on World Wide Web
Trading convexity for scalability

ICML '06 Proceedings of the 23rd international conference on Machine learning
A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data

The Journal of Machine Learning Research
Semi-supervised conditional random fields for improved sequence segmentation and labeling

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Extractive summarisation of legal texts

Artificial Intelligence and Law - AI & law in eGovernment and eDemocracy part I
Multi-dimensional classification of biomedical text

Bioinformatics
Active Learning Strategies for Multi-Label Text Classification

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Semisupervised Learning for Computational Linguistics

Semisupervised Learning for Computational Linguistics
Effective multi-label active learning for text classification

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Linguistically motivated large-scale NLP with C&C and boxer

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Generative content models for structural analysis of medical abstracts

BioNLP '06 Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis
Efficient computation of entropy gradient for semi-supervised conditional random fields

NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
Improving verb clustering with automatically acquired selectional preferences

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Towards discipline-independent argumentative zoning: evidence from chemistry and computational linguistics

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Accurate argumentative zoning with maximum entropy models

NLPIR4DL '09 Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries
Domain adaptation meets active learning

ALNLP '10 Proceedings of the NAACL HLT 2010 Workshop on Active Learning for Natural Language Processing
Identifying the information structure of scientific abstracts: an investigation of three different schemes

BioNLP '10 Proceedings of the 2010 Workshop on Biomedical Natural Language Processing
Semi-supervised discourse relation classification with structural learning

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part I

A coherence model based on syntactic patterns

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Argumentative Zoning (AZ) -- analysis of the argumentative structure of a scientific paper -- has proved useful for a number of information access tasks. Current approaches to AZ rely on supervised machine learning (ML). Requiring large amounts of annotated data, these approaches are expensive to develop and port to different domains and tasks. A potential solution to this problem is to use weakly-supervised ML instead. We investigate the performance of four weakly-supervised classifiers on scientific abstract data annotated for multiple AZ classes. Our best classifier based on the combination of active learning and self-training outperforms our best supervised classifier, yielding a high accuracy of 81% when using just 10% of the labeled data. This result suggests that weakly-supervised learning could be employed to improve the practical applicability and portability of AZ across different information access tasks.