Autonomous and adaptive identification of topics in unstructured text

Authors:
Louis Massey
Affiliations:
Department of Mathematics and Computer Science, Royal Military College, Kingston, Canada
Venue:
KES'11 Proceedings of the 15th international conference on Knowledge-based and intelligent information and engineering systems - Volume Part II
Year:
2011

Citing 22
Cited 1

CYC: a large-scale investment in knowledge infrastructure

Communications of the ACM
Computer Evaluation of Indexing and Text Processing

Journal of the ACM (JACM)
Data clustering: a review

ACM Computing Surveys (CSUR)
Concept decompositions for large sparse text data using clustering

Machine Learning
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Learning Algorithms for Keyphrase Extraction

Information Retrieval
Feature Engineering for Text Classification

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
On the quality of ART1 text clustering

Neural Networks - 2003 Special issue: Advances in neural networks research — IJCNN'03
Latent dirichlet allocation

The Journal of Machine Learning Research
Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data

Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data
Improved automatic keyword extraction given more linguistic knowledge

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
A knowledge-based search engine powered by wikipedia

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Enhancing text clustering by leveraging Wikipedia semantics

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Mining the Web to Create Specialized Glossaries

IEEE Intelligent Systems
Learning to link with wikipedia

Proceedings of the 17th ACM conference on Information and knowledge management
Word sense disambiguation: A survey

ACM Computing Surveys (CSUR)
Web page classification: Features and algorithms

ACM Computing Surveys (CSUR)
The state of the art in tag ontologies: a semantic model for tagging and folksonomies

DCMI '08 Proceedings of the 2008 International Conference on Dublin Core and Metadata Applications
Classifying search queries using the Web as a source of knowledge

ACM Transactions on the Web (TWEB)
Overcoming the brittleness bottleneck using wikipedia: enhancing text categorization with encyclopedic knowledge

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
A probabilistic framework for automatic term recognition

Intelligent Data Analysis
A statistical approach to mechanized encoding and searching of literary information

IBM Journal of Research and Development

A Cognitive Framework for Core Language Understanding and its Computational Implementation

International Journal of Cognitive Informatics and Natural Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Existing topic identification techniques must tackle an important problem: they depend on human intervention, thus incurring major preparation costs and lacking operational flexibility when facing novelty. To resolve this issue, we propose an adaptable and autonomous algorithm that discovers topics in unstructured text documents. The algorithm is based on principles that differ from existing natural language processing and artificial intelligence techniques. These principles involve the retrieval, activation and decay of general-purpose lexical knowledge, inspired by how the brain may process information when someone reads. The algorithm handles words sequentially in a single document, contrary to the usual corpus-based bag-of-words approach. Empirical results demonstrate the potential of the new algorithm.