Untangling text data mining

Authors:
Marti A. Hearst
Affiliations:
University of California, Berkeley, Berkeley, CA
Venue:
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Year:
1999

Citing 18
Cited 119

A self-organizing semantic map for information retrieval

SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
Complementary structures in disjoint science literatures

SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Bead: explorations in information visualization

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Query expansion using lexical-semantic relations

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Galaxy of news: an approach to visualizing and understanding expansive news landscapes

UIST '94 Proceedings of the 7th annual ACM symposium on User interface software and technology
Query expansion using local and global document analysis

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
An interactive system for finding complementary literatures: a stimulus to scientific discovery

Artificial Intelligence - Special issue on scientific discovery
An informal information-seeking environment

Journal of the American Society for Information Science - Special issue on current research in human-computer interaction
Learning to extract symbolic knowledge from the World Wide Web

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Internet browsing and searching: user evaluations of category map and concept space techniques

Journal of the American Society for Information Science - Special topic issue: artificial intelligence techniques for emerging information systems applications
Authoritative sources in a hyperlinked environment

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Modern Information Retrieval

Modern Information Retrieval
Editorial

Data Mining and Knowledge Discovery
Visualizing the non-visual: spatial analysis and interaction with information from text documents

INFOVIS '95 Proceedings of the 1995 IEEE Symposium on Information Visualization
A mixed-initiative planning approach to exploratory data analysis

A mixed-initiative planning approach to exploratory data analysis
Automatic acquisition of a large subcategorization dictionary from corpora

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Toward an information visualization workspace: combining multiple means of expression

Human-Computer Interaction

Web mining research: a survey

ACM SIGKDD Explorations Newsletter
Evaluation of DEFINDER: a system to mine definitions from consumer-oriented medical text

Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
Systems support for scalable data mining

ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Evaluating the novelty of text-mined rules using lexical knowledge

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Mining from open answers in questionnaire data

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Hypertext and knowledge management

Proceedings of the 12th ACM conference on Hypertext and Hypermedia
Finding the flow in web site search

Communications of the ACM
Interpreting microarray expression data using text annotating the genes

Information Sciences—Applications: An International Journal
EDGAR-analyzer: automating the analysis of corporate data contained in the SEC's EDGAR database

Decision Support Systems - Web retrieval and mining
Automatic labeling of semantic roles

Computational Linguistics
Association Rule Extraction for Text Mining

FQAS '02 Proceedings of the 5th International Conference on Flexible Query Answering Systems
Text Mining at Detail Level Using Conceptual Graphs

ICCS '02 Proceedings of the 10th International Conference on Conceptual Structures: Integration and Interfaces
Technology of Text Mining

MLDM '01 Proceedings of the Second International Workshop on Machine Learning and Data Mining in Pattern Recognition
Mining Knowledge from Text Collections Using Automatically Generated Metadata

PAKM '02 Proceedings of the 4th International Conference on Practical Aspects of Knowledge Management
Text Summarization in Data Mining

Soft-Ware 2002 Proceedings of the First International Conference on Computing in an Imperfect World
Finding Correlative Associations among News Topics

CICLing '01 Proceedings of the Second International Conference on Computational Linguistics and Intelligent Text Processing
Stimulating Discovery

DS '01 Proceedings of the 4th International Conference on Discovery Science
Passage-Based Document Retrieval as a Tool for Text Mining with User's Information Needs

DS '01 Proceedings of the 4th International Conference on Discovery Science
Mining Text Data: Special Features and Patterns

Proceedings of the ESF Exploratory Workshop on Pattern Detection and Discovery
Medical Knowledge Acquisition from the Electronic Encyclopedia of China

AIME '01 Proceedings of the 8th Conference on AI in Medicine in Europe: Artificial Intelligence Medicine
Research Topics Discovery from WWW by Keywords Association Rules

RSCTC '00 Revised Papers from the Second International Conference on Rough Sets and Current Trends in Computing
A Statistical Approach to the Discovery of Ephemeral Associations among News Topics

DEXA '01 Proceedings of the 12th International Conference on Database and Expert Systems Applications
Industry: text mining with self-organizing maps

Handbook of data mining and knowledge discovery
Identifying semantic relations in text

Exploring artificial intelligence in the new millennium
Critical and future trends in data mining: a review of key data mining technologies/applications

Data mining
Differentiating data- and text-mining terminology

SAICSIT '03 Proceedings of the 2003 annual research conference of the South African institute of computer scientists and information technologists on Enablement through technology
Experimental study of discovering essential information from customer inquiry

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Information extraction from case law and retrieval of prior cases

Artificial Intelligence - Special issue on AI and law
Intelligent knowledge extraction from the web

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems - Intelligent information systems
Extracting molecular binding relationships from biomedical text

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Anchor text mining for translation of Web queries: A transitive translation approach

ACM Transactions on Information Systems (TOIS)
Text mining: generating hypotheses from MEDLINE

Journal of the American Society for Information Science and Technology
Constructing an associative concept space for literature-based discovery

Journal of the American Society for Information Science and Technology
Synergy between medical informatics and bioinformatics: facilitating genomic medicine for future health care

Journal of Biomedical Informatics
Graph-based text database for knowledge discovery

Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
A cross-collection mixture model for comparative text mining

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Text analysis and knowledge mining system

IBM Systems Journal
A text-mining system for knowledge discovery from biomedical documents

IBM Systems Journal
Creation of an expert witness database through text mining

ICAIL '03 Proceedings of the 9th international conference on Artificial intelligence and law
Generating association graphs of non-cooccurring text objects using transitive methods

Proceedings of the 2005 ACM symposium on Applied computing
Factor matrix text filtering and clustering: Research Articles

Journal of the American Society for Information Science and Technology
Mining Text for Expert Witnesses

IEEE Software
Incorporating context in text analysis by interactive activation with competition artificial neural networks

Information Processing and Management: an International Journal
Automatic labeling of semantic roles

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
eLearning Assessment through Textual Analysis of Class Discussions

ICALT '05 Proceedings of the Fifth IEEE International Conference on Advanced Learning Technologies
Support Vector Learning for Semantic Argument Classification

Machine Learning
Discovering evolutionary theme patterns from text: an exploration of temporal text mining

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Web mining from competitors' websites

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Extracting statistical data frames from text

ACM SIGKDD Explorations Newsletter - Natural language processing and text mining
Taxonomy generation for text segments: A practical web-based approach

ACM Transactions on Information Systems (TOIS)
Handbook for Language Engineers edited by Ali Farghaly

Computational Linguistics
Maximal Association Rules: A Tool for Mining Associations in Text

Journal of Intelligent Information Systems
Exploring erotics in Emily Dickinson's correspondence with text mining and visual interfaces

Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Mining Text with Pimiento

IEEE Internet Computing
Towards applying text mining and natural language processing for biomedical ontology acquisition

TMBIO '06 Proceedings of the 1st international workshop on Text mining in bioinformatics
Validating associations in biological databases

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Deeper sentiment analysis using machine translation technology

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Integrating probabilistic extraction models and data mining to discover relations and patterns in text

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Intrusion detection in web applications using text mining

Engineering Applications of Artificial Intelligence
Text mining techniques for patent analysis

Information Processing and Management: an International Journal
P-TAG: large scale automatic generation of personalized annotation tags for the web

Proceedings of the 16th international conference on World Wide Web
Book review:

Computational Linguistics
Integrating data and text mining processes for digital library applications

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Information Extraction from Web Pages Using Presentation Regularities and Domain Knowledge

World Wide Web
Patent surrogate extraction and evaluation in the context of patent mapping

Journal of Information Science
Overview and semantic issues of text mining

ACM SIGMOD Record
Text Mining through Entity-Relationship Based Information Extraction

WI-IATW '07 Proceedings of the 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Workshops
Prototypical case mining from biomedical literature for bootstrapping a case base

Applied Intelligence
Easy web service discovery: A query-by-example approach

Science of Computer Programming
On the conceptual tag refinement

Proceedings of the 2008 ACM symposium on Applied computing
Query by example for web services

Proceedings of the 2008 ACM symposium on Applied computing
Generating Value from Textual Discovery

ICCS '07 Proceedings of the 7th international conference on Computational Science, Part III: ICCS 2007
A Compact Arabic Lexical Semantics Language Resource Based on the Theory of Semantic Fields

GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Gather customer concerns from online product reviews - A text summarization approach

Expert Systems with Applications: An International Journal
SPYWatch, Overcoming Linguistic Barriers in Information Management

EuroISI '08 Proceedings of the 1st European Conference on Intelligence and Security Informatics
Using Stemming Algorithms on a Grid Environment

High Performance Computing for Computational Science - VECPAR 2008
Literature mining method RaJoLink for uncovering relations between biomedical concepts

Journal of Biomedical Informatics
A sentence level probabilistic model for evolutionary theme pattern mining from news corpora

Proceedings of the 2009 ACM symposium on Applied Computing
Discovering implicit associations among critical biological entities

International Journal of Data Mining and Bioinformatics
RaJoLink: A Method for Finding Seeds of Future Discoveries in Nowadays Literature

ISMIS '09 Proceedings of the 18th International Symposium on Foundations of Intelligent Systems
Mining soft-matching rules from textual data

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
A symbolic approach to automatic multiword term structuring

Computer Speech and Language
SWeMoF: A Semantic Framework to Discover Patterns in Learning Networks

EC-TEL '09 Proceedings of the 4th European Conference on Technology Enhanced Learning: Learning in the Synergy of Multiple Disciplines
Concept mining for indexing medical literature

Engineering Applications of Artificial Intelligence
Incorporating context in text analysis by interactive activation with competition artificial neural networks

Information Processing and Management: an International Journal
A method for generating plans for retail store improvements using text mining and conjoint analysis

Proceedings of the 2007 conference on Human interface: Part II
Corpus building for corporate knowledge discovery and management: a case study of manufacturing

KES'07/WIRN'07 Proceedings of the 11th international conference, KES 2007 and XVII Italian workshop on neural networks conference on Knowledge-based intelligent information and engineering systems: Part I
Literature-based discovery by an enhanced information retrieval model

DS'07 Proceedings of the 10th international conference on Discovery science
Three fold system (3FS) for mental health domain

OTM'07 Proceedings of the 2007 OTM Confederated international conference on On the move to meaningful internet systems - Volume Part II
Methodological Review: Text mining for traditional Chinese medical knowledge discovery: A survey

Journal of Biomedical Informatics
Stalker: overcoming linguistic barriers in open source intelligence

International Journal of Networking and Virtual Organisations
Structure and infrastructure of infectious agent research literature: SARS

Scientometrics
EduMiner: Using text mining for automatic formative assessment

Expert Systems with Applications: An International Journal
Mining on terms extraction from web news

ICCCI'10 Proceedings of the Second international conference on Computational collective intelligence: technologies and applications - Volume PartI
Instructional design for remedial English e-learning

ICCCI'10 Proceedings of the Second international conference on Computational collective intelligence: technologies and applications - Volume Part II
A K-mixture connective-strength-based approach to automatic text summarisation

International Journal of Intelligent Systems Technologies and Applications
Discovering market trends in the biotechnology industry

International Journal of Business Intelligence and Data Mining
CyberGate: a design framework and system for text analysis of computer-mediated communication

MIS Quarterly
A text-based decision support system for financial sequence prediction

Decision Support Systems
GetItFull – a tool for downloading and pre-processing full-text journal articles

KDLL'06 Proceedings of the 2006 international conference on Knowledge Discovery in Life Science Literature
Categorizing unknown text segments for information extraction using a search result mining approach

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Semantic partitioning of web pages

WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
Local flow betweenness centrality for clustering community graphs

WINE'05 Proceedings of the First international conference on Internet and Network Economics
Using concept lattices for text retrieval and mining

Formal Concept Analysis
Concept mining for indexing medical literature

MLDM'05 Proceedings of the 4th international conference on Machine Learning and Data Mining in Pattern Recognition
Analysis of textual data with multiple classes

ISMIS'05 Proceedings of the 15th international conference on Foundations of Intelligent Systems
A grid infrastructure for text mining of full text articles and creation of a knowledge base of gene relations

ISBMDA'05 Proceedings of the 6th International conference on Biological and Medical Data Analysis
The arrowsmith project: 2005 status report

DS'05 Proceedings of the 8th international conference on Discovery Science
Named relationship mining from medical literature

ICDM'06 Proceedings of the 6th Industrial Conference on Data Mining conference on Advances in Data Mining: applications in Medicine, Web Mining, Marketing, Image and Signal Mining
Collaborative discovery through biological language modeling interface

Ambient Intelligence in Everyday Life
Utilizing acquired healthcare knowledge, based on using electronic health records

USAB'11 Proceedings of the 7th conference on Workgroup Human-Computer Interaction and Usability Engineering of the Austrian Computer Society: information Quality in e-Health
Annotating text segments using a web-based categorization approach

ICADL'05 Proceedings of the 8th international conference on Asian Digital Libraries: implementing strategies and sharing experiences
Decision support for improved service effectiveness using domain aware text mining

Knowledge-Based Systems
Network vulnerability analysis using text mining

ACIIDS'12 Proceedings of the 4th Asian conference on Intelligent Information and Database Systems - Volume Part II
Design and implementation of an intelligent automatic question answering system based on data mining

ICSI'12 Proceedings of the Third international conference on Advances in Swarm Intelligence - Volume Part II
The impact of semi-supervised clustering on text classification

Proceedings of the 17th Panhellenic Conference on Informatics
Knowledge discovery in inspection reports of marine structures

Expert Systems with Applications: An International Journal
Automatic text classification to support systematic reviews in medicine

Expert Systems with Applications: An International Journal
Text Mining in Bioinformatics: Research and Application

International Journal of Information Retrieval Research

Quantified Score

Hi-index	0.02

Visualization

Abstract

The possibilities for data mining from large text collections are virtually untapped. Text expresses a vast, rich range of information, but encodes this information in a form that is difficult to decipher automatically. Perhaps for this reason, there has been little work in text data mining to date, and most people who have talked about it have either conflated it with information access or have not made use of text directly to discover heretofore unknown information. In this paper I will first define data mining, information access, and corpus-based computational linguistics, and then discuss the relationship of these to text data mining. The intent behind these contrasts is to draw attention to exciting new kinds of problems for computational linguists. I describe examples of what I consider to be real text data mining efforts and briefly outline recent ideas about how to pursue exploratory data analysis over text.