Combining labeled and unlabeled data for learning cross-document structural relationships

Authors:
Zhu Zhang;Dragomir Radev
Affiliations:
University of Michigan, Ann Arbor, MI;University of Michigan, Ann Arbor, MI
Venue:
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Year:
2004

Citing 18
Cited 5

Bagging predictors

Machine Learning
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Computer Evaluation of Indexing and Text Processing

Journal of the ACM (JACM)
BoosTexter: A Boosting-based Systemfor Text Categorization

Machine Learning - Special issue on information retrieval
Towards CST-enhanced summarization

Eighteenth national conference on Artificial intelligence
A Maximum-Entropy-Inspired Parser

A Maximum-Entropy-Inspired Parser
The rhetorical parsing, summarization, and generation of natural language texts

The rhetorical parsing, summarization, and generation of natural language texts
The rhetorical parsing, summarization, and generation of natural language texts

The rhetorical parsing, summarization, and generation of natural language texts
Head-driven statistical models for natural language parsing

Head-driven statistical models for natural language parsing
Learning cross-document structural relationships using boosting

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Scaling to very very large corpora for natural language disambiguation

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Bootstrapping

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
An unsupervised approach to recognizing discourse relations

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Weakly supervised natural language learning without redundant views

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
A common theory of information fusion from multiple text sources step one: cross-document structure

SIGDIAL '00 Proceedings of the 1st SIGdial workshop on Discourse and dialogue - Volume 10

Weakly-supervised relation classification for information extraction

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Review article: A review of structured document retrieval (SDR) technology to improve information access performance in engineering document management

Computers in Industry
Statement map: assisting information crediblity analysis by visualizing arguments

Proceedings of the 3rd workshop on Information credibility on the web
Statement map: reducing web information credibility noise through opinion classification

AND '10 Proceedings of the fourth workshop on Analytics for noisy unstructured text data
Revisiting Cross-document Structure Theory for multi-document discourse parsing

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multi-document discourse analysis has emerged with the potential of improving various NLP applications. Based on the newly proposed Cross-document Structure Theory (CST), this paper describes an empirical study that classifies CST relationships between sentence pairs extracted from topically related documents, exploiting both labeled and unlabeled data. We investigate a binary classifier for determining existence of structural relationships and a full classifier using the full taxonomy of relationships. We show that in both cases the exploitation of unlabeled data helps improve the performance of learned classifiers.