Revisiting Cross-document Structure Theory for multi-document discourse parsing

Authors:
Erick Galani Maziero;Maria Lucía Del Rosário Castro Jorge;Thiago Alexandre Salgueiro Pardo
Affiliations:
-;-;-
Venue:
Information Processing and Management: an International Journal
Year:
2014

Citing 24
Cited 0

TEXTNET: a network-based approach to text handling

ACM Transactions on Information Systems (TOIS)
Generating summaries of multiple news articles

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
WordNet: a lexical database for English

Communications of the ACM
Assessing agreement on classification tasks: the kappa statistic

Computational Linguistics
Automatic hypertext link typing

Proceedings of the the seventh ACM conference on Hypertext
Computer Evaluation of Indexing and Text Processing

Journal of the ACM (JACM)
An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Towards CST-enhanced summarization

Eighteenth national conference on Artificial intelligence
Learning cross-document structural relationships using boosting

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Generating natural language summaries from multiple on-line sources

Computational Linguistics - Special issue on natural language generation
The rhetorical parsing of natural language texts

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
A common theory of information fusion from multiple text sources step one: cross-document structure

SIGDIAL '00 Proceedings of the 1st SIGdial workshop on Discourse and dialogue - Volume 10
Text summarization challenge 2: text summarization evaluation at NTCIR workshop 3

HLT-NAACL-DUC '03 Proceedings of the HLT-NAACL 03 on Text summarization workshop - Volume 5
Learning to recognize features of valid textual entailments

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Automatically generating hypertext in newspaper articles by computing semantic relatedness

NeMLaP3/CoNLL '98 Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning
WordNet::Similarity: measuring the relatedness of concepts

HLT-NAACL--Demonstrations '04 Demonstration Papers at HLT-NAACL 2004
Classification of semantic relations by humans and machines

EMSEE '05 Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment
A base de dados lexical e a interface web do TeP 2.0: thesaurus eletrônico para o Português do Brasil

Companion Proceedings of the XIV Brazilian Symposium on Multimedia and the Web
Experiments with CST-based multidocument summarization

TextGraphs-5 Proceedings of the 2010 Workshop on Graph-based Methods for Natural Language Processing
Multi-document summarization using link analysis based on rhetorical relations between sentences

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Recognizing confinement in web texts

IWCS '11 Proceedings of the Ninth International Conference on Computational Semantics
Combining labeled and unlabeled data for learning cross-document structural relationships

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Recognizing Textual Entailment with a Semantic Edit Distance Metric

MICAI '12 Proceedings of the 2012 11th Mexican International Conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multi-document discourse parsing aims to automatically identify the relations among textual spans from different texts on the same topic. Recently, with the growing amount of information and the emergence of new technologies that deal with many sources of information, more precise and efficient parsing techniques are required. The most relevant theory to multi-document relationship, Cross-document Structure Theory (CST), has been used for parsing purposes before, though the results had not been satisfactory. CST has received many critics because of its subjectivity, which may lead to low annotation agreement and, consequently, to poor parsing performance. In this work, we propose a refinement of the original CST, which consists in (i) formalizing the relationship definitions, (ii) pruning and combining some relations based on their meaning, and (iii) organizing the relations in a hierarchical structure. The hypothesis for this refinement is that it will lead to better agreement in the annotation and consequently to better parsing results. For this aim, it was built an annotated corpus according to this refinement and it was observed an improvement in the annotation agreement. Based on this corpus, a parser was developed using machine learning techniques and hand-crafted rules. Specifically, hierarchical techniques were used to capture the hierarchical organization of the relations according to the proposed refinement of CST. These two approaches were used to identify the relations among texts spans and to generate multi-document annotation structure. Results outperformed other CST parsers, showing the adequacy of the proposed refinement in the theory.