Mining Text Using Keyword Distributions

Authors:
Ronen Feldman;Ido Dagan;Haym Hirsh
Affiliations:
Department of Mathematics and Computer Science Department, Bar-Ilan University, Ramat-Gan, ISRAEL. E-mail: feldman@cs.biu.ac.il, dagan@cs.biu.ac.il;Department of Mathematics and Computer Science Department, Bar-Ilan University, Ramat-Gan, ISRAEL. E-mail: feldman@cs.biu.ac.il, dagan@cs.biu.ac.il;Deptartment of Computer Science, Rutgers University, Piscataway, NJ USA 08855. E-mail: hirsh@cs.rutgers.edu
Venue:
Journal of Intelligent Information Systems
Year:
1998

Citing 15
Cited 33

Automatic text processing

Automatic text processing
Elements of information theory

Elements of information theory
An evaluation of phrasal and clustered representations on a text categorization task

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
The dynamic HomeFinder: evaluating dynamic queries in a real-estate information exploration system

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Constant interaction-time scatter/gather browsing of very large document collections

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Towards language independent automated learning of text categorization models

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
TileBars: visualization of term distribution information in full text information access

CHI '95 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Explora: a multipattern and multistrategy discovery assistant

Advances in knowledge discovery and data mining
Discovery of Multiple-Level Association Rules from Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Mining Generalized Association Rules

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Exploiting sophisticated representations for document retrieval

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
A probabilistic model for text categorization: based on a single random variable with multiple values

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Joining statistics with NLP for text categorization

ANLC '92 Proceedings of the third conference on Applied natural language processing
Similarity-based estimation of word cooccurrence probabilities

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics

A Web text mining approach based on self-organizing map

Proceedings of the 2nd international workshop on Web information and data management
A framework for specifying explicit bias for revision of approximate information extraction rules

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Concept-based knowledge discovery in texts extracted from the Web

ACM SIGKDD Explorations Newsletter
Acquisition of a Knowledge Dictionary from Training Examples Including Multiple Values

ISMIS '02 Proceedings of the 13th International Symposium on Foundations of Intelligent Systems
Automatic thesaurus generation for Chinese documents

Journal of the American Society for Information Science and Technology
Mining text documents for thematic hierarchies using self-organizing maps

Data mining
LitLinker: capturing connections across the biomedical literature

Proceedings of the 2nd international conference on Knowledge capture
Rule discovery from textual data based on key phrase patterns

Proceedings of the 2004 ACM symposium on Applied computing
Generating association graphs of non-cooccurring text objects using transitive methods

Proceedings of the 2005 ACM symposium on Applied computing
Automatic Category Theme Identification and Hierarchy Generation for Chinese Text Categorization

Journal of Intelligent Information Systems
Maximal Association Rules: A Tool for Mining Associations in Text

Journal of Intelligent Information Systems
Mining Ontology for Automatically Acquiring Web User Information Needs

IEEE Transactions on Knowledge and Data Engineering
Visualization of unstructured text sequences of nursing narratives

Proceedings of the 2006 ACM symposium on Applied computing
Integrating information retrieval and data mining to discover project team coordination patterns

Decision Support Systems
Parallel mining of association rules from text databases

The Journal of Supercomputing
Image semantics discovery from web pages for semantic-based image retrieval using self-organizing maps

Expert Systems with Applications: An International Journal
Comparing keywords and taxonomies in the representation of users profiles in a content-based recommender system

Proceedings of the 2008 ACM symposium on Applied computing
A method for multilingual text mining and retrieval using growing hierarchical self-organizing maps

Journal of Information Science
Effect of OCR-errors on the transformation of semi-structured text data into relational database

Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data
An e-mail analysis method based on text mining techniques

Applied Soft Computing
Automatic keyword prediction using Google similarity distance

Expert Systems with Applications: An International Journal
Generic title labeling for clustered documents

Expert Systems with Applications: An International Journal
Iterative visual clustering for unstructured text mining

ISB '10 Proceedings of the International Symposium on Biocomputing
Mining rough association from text documents for web information gathering

Transactions on rough sets VII
Ontology based web mining for information gathering

WImBI'06 Proceedings of the 1st WICI international conference on Web intelligence meets brain informatics
Visualizing unstructured text sequences using iterative visual clustering

VISUAL'07 Proceedings of the 9th international conference on Advances in visual information systems
Word AdHoc Network: Using Google Core Distance to extract the most relevant information

Knowledge-Based Systems
Using Google latent semantic distance to extract the most relevant information

Expert Systems with Applications: An International Journal
How to design and utilize online customer center to support new product concept generation

Expert Systems with Applications: An International Journal
Toward generic title generation for clustered documents

AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Mining rough association from text documents

RSCTC'06 Proceedings of the 5th international conference on Rough Sets and Current Trends in Computing
Analysis of textual data with multiple classes

ISMIS'05 Proceedings of the 15th international conference on Foundations of Intelligent Systems
Web data mining and reasoning model

AI'04 Proceedings of the 17th Australian joint conference on Advances in Artificial Intelligence

Quantified Score

Hi-index	0.01

Visualization

Abstract

Knowledge Discovery in Databases (KDD) focuses on thecomputerized exploration of large amounts of data and on thediscovery of interesting patterns within them. While most workon KDD has been concerned with structured databases, there hasbeen little work on handling the huge amount of information thatis available only in unstructured textual form. This paperdescribes the KDT system for Knowledge Discovery in Text, inwhich documents are labeled by keywords, and knowledge discoveryis performed by analyzing the co-occurrence frequencies of thevarious keywords labeling the documents. We show how thiskeyword-frequency approach supports a range of KDD operations,providing a suitable foundation for knowledge discovery andexploration for collections of unstructured text.