Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Little words can make a big difference for text classification
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Reexamining the cluster hypothesis: scatter/gather on retrieval results
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
The Theory and Practice of Discourse Parsing and Summarization
The Theory and Practice of Discourse Parsing and Summarization
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Feature Engineering for Text Classification
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Lexical cohesion computed by thesaural relations as an indicator of the structure of text
Computational Linguistics
TextTiling: segmenting text into multi-paragraph subtopic passages
Computational Linguistics
Automatic word sense discrimination
Computational Linguistics - Special issue on word sense disambiguation
Exploring the use of linguistic features in domain and genre classification
EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
ACL '84 Proceedings of the 10th International Conference on Computational Linguistics and 22nd annual meeting on Association for Computational Linguistics
Automatic retrieval and clustering of similar words
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Cognitive high level information fusion
Information Sciences: an International Journal
Classification of Documents Based on the Structure of Their DOM Trees
Neural Information Processing
Structure-sensitive learning of text types
AI'07 Proceedings of the 20th Australian joint conference on Advances in artificial intelligence
Evolving agents: communication and cognition
AIS-ADM 2005 Proceedings of the 2005 international conference on Autonomous Intelligent Systems: agents and Data Mining
Hi-index | 0.00 |
Text representation is a central task for any approach to automatic learning from texts. It requires a format which allows to interrelate texts even if they do not share content words, but deal with similar topics. Furthermore, measuring text similarities raises the question of how to organize the resulting clusters. This paper presents cohesion trees (CT) as a data structure for the perspective, hierarchical organization of text corpora. CTs operate on alternative text representation models taking lexical organization, quantitative text characteristics, and text structure into account. It is shown that CTs realize text linkages which are lexically more homogeneous than those produced by minimal spanning trees.