Distributed multi-lingual content based text mining DML - CBTM

Authors:
S. Chitrakala;D. Manjula
Affiliations:
Department of Computer Science and Engineering, Easwari Engineering College, Ramapuram, Anna University, Chennai, Tamil Nadu, India;Department of Computer Science and Engineering, College Of Engineering, Guindy, Anna University, Chennai, Tamil Nadu, India
Venue:
ACST'07 Proceedings of the third conference on IASTED International Conference: Advances in Computer Science and Technology
Year:
2007

Citing 6
Cited 0

Text-Learning and Related Intelligent Agents: A Survey

IEEE Intelligent Systems
A Multilingual Text Mining Approach Based on Self-Organizing Maps

Applied Intelligence
Toward Content Based Retrieval from Scientific Text Corpora

ICAIS '02 Proceedings of the 2002 IEEE International Conference on Artificial Intelligence Systems (ICAIS'02)
Filtering multilingual Web content using fuzzy logic and self-organizing maps

Neural Computing and Applications
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Mining Documents in a Small Enterprise Using WordStat

ITNG '06 Proceedings of the Third International Conference on Information Technology: New Generations

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the explosion in information over the internet, extracting knowledge from media-based data in the form of images, audio streams and videos replacing textual ones is getting more complex. So a comprehensive methodology covering all forms of data are needed which is able to provide the contents of the data in a short period of time. Text mining tools and algorithms are becoming increasingly popular with many of the books, texts and documentation getting converted to soft-copy versions and being made globally accessible. Though this trend is predominantly in English language, the need has arisen for such an approach for other languages too, as many of the ancient and out-of-print texts in different languages are getting 'softer' versions for preserving and extraction of Information and Knowledge. In the context of Indian languages this need is more pronounced as many texts in different languages, scripts, different material forms ranging from palm leaves to stone cutting and dialects are available having wealth of information in variety of disciplines. In this paper, we propose a novel content-based approach and demonstrate for textual data in the first instance, to be termed as CBTM (Content-Based Text-Mining) for knowledge discovery of multilingual texts. The proposed methodology employs a content based approach using keywords and patterns stored in the form of gif strings so that extensions to other forms of data are possible. Potential applications of this approach in a distributed environment are also highlighted. We have used the advertisements in newspapers for demonstrating the system.