Text-Learning and Related Intelligent Agents: A Survey
IEEE Intelligent Systems
A Multilingual Text Mining Approach Based on Self-Organizing Maps
Applied Intelligence
Toward Content Based Retrieval from Scientific Text Corpora
ICAIS '02 Proceedings of the 2002 IEEE International Conference on Artificial Intelligence Systems (ICAIS'02)
Filtering multilingual Web content using fuzzy logic and self-organizing maps
Neural Computing and Applications
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Mining Documents in a Small Enterprise Using WordStat
ITNG '06 Proceedings of the Third International Conference on Information Technology: New Generations
Hi-index | 0.00 |
With the explosion in information over the internet, extracting knowledge from media-based data in the form of images, audio streams and videos replacing textual ones is getting more complex. So a comprehensive methodology covering all forms of data are needed which is able to provide the contents of the data in a short period of time. Text mining tools and algorithms are becoming increasingly popular with many of the books, texts and documentation getting converted to soft-copy versions and being made globally accessible. Though this trend is predominantly in English language, the need has arisen for such an approach for other languages too, as many of the ancient and out-of-print texts in different languages are getting 'softer' versions for preserving and extraction of Information and Knowledge. In the context of Indian languages this need is more pronounced as many texts in different languages, scripts, different material forms ranging from palm leaves to stone cutting and dialects are available having wealth of information in variety of disciplines. In this paper, we propose a novel content-based approach and demonstrate for textual data in the first instance, to be termed as CBTM (Content-Based Text-Mining) for knowledge discovery of multilingual texts. The proposed methodology employs a content based approach using keywords and patterns stored in the form of gif strings so that extensions to other forms of data are possible. Potential applications of this approach in a distributed environment are also highlighted. We have used the advertisements in newspapers for demonstrating the system.