TEXTNET: a network-based approach to text handling
ACM Transactions on Information Systems (TOIS)
Generating summaries of multiple news articles
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
WordNet: a lexical database for English
Communications of the ACM
Assessing agreement on classification tasks: the kappa statistic
Computational Linguistics
Automatic hypertext link typing
Proceedings of the the seventh ACM conference on Hypertext
Computer Evaluation of Indexing and Text Processing
Journal of the ACM (JACM)
An Information-Theoretic Definition of Similarity
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Towards CST-enhanced summarization
Eighteenth national conference on Artificial intelligence
Learning cross-document structural relationships using boosting
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Generating natural language summaries from multiple on-line sources
Computational Linguistics - Special issue on natural language generation
The rhetorical parsing of natural language texts
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
A common theory of information fusion from multiple text sources step one: cross-document structure
SIGDIAL '00 Proceedings of the 1st SIGdial workshop on Discourse and dialogue - Volume 10
Text summarization challenge 2: text summarization evaluation at NTCIR workshop 3
HLT-NAACL-DUC '03 Proceedings of the HLT-NAACL 03 on Text summarization workshop - Volume 5
Learning to recognize features of valid textual entailments
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Automatically generating hypertext in newspaper articles by computing semantic relatedness
NeMLaP3/CoNLL '98 Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning
WordNet::Similarity: measuring the relatedness of concepts
HLT-NAACL--Demonstrations '04 Demonstration Papers at HLT-NAACL 2004
Classification of semantic relations by humans and machines
EMSEE '05 Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment
Companion Proceedings of the XIV Brazilian Symposium on Multimedia and the Web
Experiments with CST-based multidocument summarization
TextGraphs-5 Proceedings of the 2010 Workshop on Graph-based Methods for Natural Language Processing
Multi-document summarization using link analysis based on rhetorical relations between sentences
CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Recognizing confinement in web texts
IWCS '11 Proceedings of the Ninth International Conference on Computational Semantics
Combining labeled and unlabeled data for learning cross-document structural relationships
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Recognizing Textual Entailment with a Semantic Edit Distance Metric
MICAI '12 Proceedings of the 2012 11th Mexican International Conference on Artificial Intelligence
Hi-index | 0.00 |
Multi-document discourse parsing aims to automatically identify the relations among textual spans from different texts on the same topic. Recently, with the growing amount of information and the emergence of new technologies that deal with many sources of information, more precise and efficient parsing techniques are required. The most relevant theory to multi-document relationship, Cross-document Structure Theory (CST), has been used for parsing purposes before, though the results had not been satisfactory. CST has received many critics because of its subjectivity, which may lead to low annotation agreement and, consequently, to poor parsing performance. In this work, we propose a refinement of the original CST, which consists in (i) formalizing the relationship definitions, (ii) pruning and combining some relations based on their meaning, and (iii) organizing the relations in a hierarchical structure. The hypothesis for this refinement is that it will lead to better agreement in the annotation and consequently to better parsing results. For this aim, it was built an annotated corpus according to this refinement and it was observed an improvement in the annotation agreement. Based on this corpus, a parser was developed using machine learning techniques and hand-crafted rules. Specifically, hierarchical techniques were used to capture the hierarchical organization of the relations according to the proposed refinement of CST. These two approaches were used to identify the relations among texts spans and to generate multi-document annotation structure. Results outperformed other CST parsers, showing the adequacy of the proposed refinement in the theory.