Extraction and representation of contextual information for knowledge discovery in texts

Authors:
Patrick Perrin;Frederick E. Petry
Affiliations:
Merck Research Laboratories, Medicinal Chemistry/Molecular Systems, Rahway, NJ;Department of Electrical Engineering and Computer Science, Tulane University, New Orleans, LA
Venue:
Information Sciences—Informatics and Computer Science: An International Journal
Year:
2003

Citing 11
Cited 12

Relevance: communication and cognition

Relevance: communication and cognition
Attention, intentions, and the structure of discourse

Computational Linguistics
A statistical approach to machine translation

Computational Linguistics
Similarity neighborhoods of spoken words

Cognitive models of speech processing
Inductive logic programming and knowledge discovery in databases

Advances in knowledge discovery and data mining
Contextual representation and learning for unsupervised knowledge discovery in texts

Contextual representation and learning for unsupervised knowledge discovery in texts
Contextual Text Representation for Unsupervised Discovery in Texts

PAKDD '98 Proceedings of the Second Pacific-Asia Conference on Research and Development in Knowledge Discovery and Data Mining
Automatic acquisition of a large subcategorization dictionary from corpora

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
CRYSTAL inducing a conceptual dictionary

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Hierarchical Bayesian clustering for automatic text classification

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Automatically generating extraction patterns from untagged text

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2

Differentiating data- and text-mining terminology

SAICSIT '03 Proceedings of the 2003 annual research conference of the South African institute of computer scientists and information technologists on Enablement through technology
Context-sensitive text mining and belief revision for intelligent information retrieval on the web

Web Intelligence and Agent Systems
Modelling highly inflected languages

Information Sciences—Informatics and Computer Science: An International Journal
Feature construction for reduction of tabular knowledge-based systems

Information Sciences—Informatics and Computer Science: An International Journal
Factor matrix text filtering and clustering: Research Articles

Journal of the American Society for Information Science and Technology
Mining spatial association rules in image databases

Information Sciences: an International Journal
Rule identification using ontology while acquiring rules from Web pages

International Journal of Human-Computer Studies
Preference-Function Algorithm: a novel approach for selection of the users' preferred websites

International Journal of Business Intelligence and Data Mining
A team formation model based on knowledge and collaboration

Expert Systems with Applications: An International Journal
So, talk to me: the effect of explicit goals on the comprehension of business process narratives

MIS Quarterly
Text mining and probabilistic language modeling for online review spam detection

ACM Transactions on Management Information Systems (TMIS)
Towards fuzzy domain ontology based concept map generation for E-Learning

ICWL'07 Proceedings of the 6th international conference on Advances in web based learning

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper studies the role of lexical contextual relations for the problem of unsupervised knowledge discovery in full texts. Narrative texts have inherent structure dictated by language usage in generating them. We suggest that the relative distance of terms within a text gives sufficient information about its structure and its relevant content. Furthermore, this structure can be used to discover implicit knowledge embedded in the text, therefore serving as a good candidate to represent effectively the text content for knowledge elicitation tasks. We qualitatively demonstrate that a useful text structure and content can be systematically extracted by collocational lexical analysis without the need to encode any supplemental sources of knowledge. We present an algorithm that systematically extracts the most relevant facts in the texts and labels them by their overall theme, dictated by local contextual information. It exploits domain independent lexical frequencies and mutual information measures to find the relevant contextual units in the texts. We report results from experiments in a real-world textual database of psychiatric evaluation reports.