Communications of the ACM
A vector space model for automatic indexing
Communications of the ACM
KeyGraph: Automatic Indexing by Co-occurrence Graph based on Building Construction Metaphor
ADL '98 Proceedings of the Advances in Digital Libraries Conference
The Google Similarity Distance
IEEE Transactions on Knowledge and Data Engineering
Literature mining method RaJoLink for uncovering relations between biomedical concepts
Journal of Biomedical Informatics
The automatic creation of literature abstracts
IBM Journal of Research and Development
The design, implementation, and use of the Ngram statistics package
CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
Selecting the links in bisonets generated from document collections
Bisociative Knowledge Discovery
Hi-index | 0.00 |
We propose a method to mine novel, document-specific associations between terms in a collection of unstructured documents. We believe that documents are often best described by the relationships they establish. This is also evidenced by the popularity of conceptual maps, mind maps, and other similar methodologies to organize and summarize information. Our goal is to discover term relationships that can be used to construct conceptual maps or so called BisoNets. The model we propose, tpf---idf---tpu, looks for pairs of terms that are associated in an individual document. It considers three aspects, two of which have been generalized from tf---idf to term pairs: term pair frequency (tpf; importance for the document), inverse document frequency (idf; uniqueness in the collection), and term pair uncorrelation (tpu; independence of the terms). The last component is needed to filter out statistically dependent pairs that are not likely to be considered novel or interesting by the user. We present experimental results on two collections of documents: one extracted from Wikipedia, and one containing text mining articles with manually assigned term associations. The results indicate that the tpf---idf---tpu method can discover novel associations, that they are different from just taking pairs of tf---idf keywords, and that they match better the subjective associations of a reader.