Classifying unlabeled short texts using a fuzzy declarative approach

Authors:
Francisco P. Romero;Pascual Julián-Iranzo;Andrés Soto;Mateus Ferreira-Satler;Juan Gallardo-Casero
Affiliations:
Department of Information Technologies and Systems, University of Castilla La Mancha, Ciudad Real, Spain 13071;Department of Information Technologies and Systems, University of Castilla La Mancha, Ciudad Real, Spain 13071;Department of Computer Science, Universidad Autònoma del Carmen, Campeche, Mèxico CP 24160;Department of Information Technologies and Systems, University of Castilla La Mancha, Ciudad Real, Spain 13071;Department of Information Technologies and Systems, University of Castilla La Mancha, Ciudad Real, Spain 13071
Venue:
Language Resources and Evaluation
Year:
2013

Citing 32
Cited 2

Automated learning of decision rules for text categorization

ACM Transactions on Information Systems (TOIS)
Toward principles for the design of ontologies used for knowledge sharing

International Journal of Human-Computer Studies - Special issue: the role of formal ontology in the information technology
CYC: a large-scale investment in knowledge infrastructure

Communications of the ACM
Learning to classify text from labeled and unlabeled documents

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A Theorem on Boolean Matrices

Journal of the ACM (JACM)
Scalable association-based text classification

Proceedings of the ninth international conference on Information and knowledge management
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Information Retrieval

Information Retrieval
Improving Short-Text Classification using Unlabeled Data for Classification Problems

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Verbs semantics and lexical selection

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Concept-matching IR systems versus word-matching information retrieval systems: Considering fuzzy interrelations for indexing Web pages: Special Topic Section on Soft Approaches to Information Retrieval and Information Access on the Web

Journal of the American Society for Information Science and Technology
The Semantic Web Revisited

IEEE Intelligent Systems
Learning with unlabeled data for text categorization using bootstrapping and feature projection techniques

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Investigating unsupervised learning for text categorization bootstrapping

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Current Practices in Measuring Ontological Concept Similarity

SKG '07 Proceedings of the Third International Conference on Semantics, Knowledge and Grid
Freebase: a collaboratively created graph database for structuring human knowledge

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Introduction to Information Retrieval

Introduction to Information Retrieval
YAGO: A Large Ontology from Wikipedia and WordNet

Web Semantics: Science, Services and Agents on the World Wide Web
Categorizing blogger's interests based on short snippets of blog posts

Proceedings of the 17th ACM conference on Information and knowledge management
Text classification from unlabeled documents with bootstrapping and feature projection techniques

Information Processing and Management: an International Journal
A Similarity-Based WAM for Bousi~Prolog

IWANN '09 Proceedings of the 10th International Work-Conference on Artificial Neural Networks: Part I: Bio-Inspired Systems: Computational and Ambient Intelligence
Bousi~Prolog: a Prolog Extension Language for Flexible Query Answering

Electronic Notes in Theoretical Computer Science (ENTCS)
A declarative semantics for Bousi~Prolog

PPDP '09 Proceedings of the 11th ACM SIGPLAN conference on Principles and practice of declarative programming
AnalogySpace: reducing the dimensionality of common sense knowledge

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 1
Text categorization from category name via lexical reference

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Short text classification in twitter to improve information filtering

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Research on Short Text Classification Algorithm Based on Statistics and Rules

ISECS '10 Proceedings of the 2010 Third International Symposium on Electronic Commerce and Security
Large-scale hierarchical text classification without labelled data

Proceedings of the fourth ACM international conference on Web search and data mining
Sentic avatar: multimodal affective conversational agent with common sense

Proceedings of the Third COST 2102 international training school conference on Toward autonomous, adaptive, and context-aware multimodal interfaces: theoretical and practical issues
Sentic medoids: organizing affective common sense knowledge in a multi-dimensional vector space

ISNN'11 Proceedings of the 8th international conference on Advances in neural networks - Volume Part III
Sentic Computing for social media marketing

Multimedia Tools and Applications

A proximity-based method for discovery of generalized knowledge and its incorporation to the bousi~prolog system

IWANN'13 Proceedings of the 12th international conference on Artificial Neural Networks: advences in computational intelligence - Volume Part II
A Fuzzy linguistic prolog and its applications

Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Web 2.0 provides user-friendly tools that allow persons to create and publish content online. User generated content often takes the form of short texts (e.g., blog posts, news feeds, snippets, etc). This has motivated an increasing interest on the analysis of short texts and, specifically, on their categorisation. Text categorisation is the task of classifying documents into a certain number of predefined categories. Traditional text classification techniques are mainly based on word frequency statistical analysis and have been proved inadequate for the classification of short texts where word occurrence is too small. On the other hand, the classic approach to text categorization is based on a learning process that requires a large number of labeled training texts to achieve an accurate performance. However labeled documents might not be available, when unlabeled documents can be easily collected. This paper presents an approach to text categorisation which does not need a pre-classified set of training documents. The proposed method only requires the category names as user input. Each one of these categories is defined by means of an ontology of terms modelled by a set of what we call proximity equations. Hence, our method is not category occurrence frequency based, but highly depends on the definition of that category and how the text fits that definition. Therefore, the proposed approach is an appropriate method for short text classification where the frequency of occurrence of a category is very small or even zero. Another feature of our method is that the classification process is based on the ability of an extension of the standard Prolog language, named Bousi~Prolog , for flexible matching and knowledge representation. This declarative approach provides a text classifier which is quick and easy to build, and a classification process which is easy for the user to understand. The results of experiments showed that the proposed method achieved a reasonably useful performance.