Differentiating data- and text-mining terminology

Authors:
Jan H. Kroeze;Machdel C. Matthee;Theo J. D. Bothma
Affiliations:
Department of Informatics, School of IT, University of Pretoria, Pretoria, 0002;Department of Informatics, School of IT, University of Pretoria, Pretoria, 0002;Department of Information Science, School of IT, University of Pretoria, Pretoria, 0002
Venue:
SAICSIT '03 Proceedings of the 2003 annual research conference of the South African institute of computer scientists and information technologists on Enablement through technology
Year:
2003

Citing 11
Cited 1

Data mining solutions: methods and tools for solving real-world problems

Data mining solutions: methods and tools for solving real-world problems
Document Warehousing and Text Mining: Techniques for Improving Business Operations, Marketing, and Sales

Document Warehousing and Text Mining: Techniques for Improving Business Operations, Marketing, and Sales
Data Mining: Technologies, Techniques, Tools, and Trends

Data Mining: Technologies, Techniques, Tools, and Trends
Data Warehousing, Data Mining, and Olap

Data Warehousing, Data Mining, and Olap
Database Systems Design, Implementation and Management

Database Systems Design, Implementation and Management
Principles of Information Systems: A Managerial Approach

Principles of Information Systems: A Managerial Approach
Extraction and representation of contextual information for knowledge discovery in texts

Information Sciences—Informatics and Computer Science: An International Journal
Text analysis and knowledge mining system

IBM Systems Journal
Untangling text data mining

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Automated text summarization and the SUMMARIST system

TIPSTER '98 Proceedings of a workshop on held at Baltimore, Maryland: October 13-15, 1998

Toward total business intelligence incorporating structured and unstructured data

Proceedings of the 2nd International Workshop on Business intelligencE and the WEB

Quantified Score

Hi-index	0.00

Visualization

Abstract

When a new discipline emerges it usually takes some time and lots of academic discussion before concepts and terms get standardised. Such a new discipline is text mining. In a groundbreaking paper, Untangling text data mining, Hearst [1999] tackled the problem of clarifying text-mining concepts and terminology. This essay aims to build on Hearst's ideas by pointing out some inconsistencies and suggesting an improved and extended categorisation of data- and text-mining techniques. The essay is a conceptual study. A short overview of the problems regarding text-mining concepts is given. This is followed by a summary and critical discussion of Hearst's attempt to clarify the terminology. The essence of text mining is found to be the discovery or creation of new knowledge from a collection of documents. The parameters of non-novel, semi-novel and novel investigation are used to differentiate between full-text information retrieval, standard text mining and intelligent text mining. The same parameters are also used to differentiate between related processes for numerical data and text metadata. These distinctions may be used as a road map in the evolving fields of data/information retrieval, knowledge discovery and the creation of new knowledge.