The automatic identification of stop words
Journal of Information Science
Class-based n-gram models of natural language
Computational Linguistics
Noise reduction in a statistical approach to text categorization
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Using corpus statistics to remove redundant words in text categorization
Journal of the American Society for Information Science
Context-sensitive learning methods for text categorization
ACM Transactions on Information Systems (TOIS)
A vector space model for automatic indexing
Communications of the ACM
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Contextual word similarity and estimation from sparse data
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Text segmentation based on similarity between words
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Evaluating high accuracy retrieval techniques
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Mining knowledge from text using information extraction
ACM SIGKDD Explorations Newsletter - Natural language processing and text mining
Enhancing Data Analysis with Noise Removal
IEEE Transactions on Knowledge and Data Engineering
Focusing on Context in Network Traffic Analysis
IEEE Computer Graphics and Applications
Recovering 3D Human Body Configurations Using Shape Contexts
IEEE Transactions on Pattern Analysis and Machine Intelligence
Interest-based personalized search
ACM Transactions on Information Systems (TOIS)
Modeling of long distance context dependency
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
IEEE Transactions on Computers
Extractive spoken document summarization for information retrieval
Pattern Recognition Letters
Towards a belief-revision-based adaptive and context-sensitive information retrieval system
ACM Transactions on Information Systems (TOIS)
Comments-oriented document summarization: understanding documents with readers' feedback
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
An application framework for mobile, context-aware trails
Pervasive and Mobile Computing
Evaluating the Impact of Information Distortion on Normalized Compression Distance
ICMCTA '08 Proceedings of the 2nd international Castle meeting on Coding Theory and Applications
Opinion Mining and Sentiment Analysis
Foundations and Trends in Information Retrieval
A general shape context framework for object identification
Computer Vision and Image Understanding
Exploiting temporal contexts in text classification
Proceedings of the 17th ACM conference on Information and knowledge management
Using contextual information and multidimensional approach for recommendation
Expert Systems with Applications: An International Journal
Context-Based Term Frequency Assessment for Text Classification
PRICAI '08 Proceedings of the 10th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
Learning concept hierarchies from text corpora using formal concept analysis
Journal of Artificial Intelligence Research
On the design and prototype implementation of a multimodal situation aware system
IEEE Transactions on Multimedia
Music Recommendation Using Content and Context Information Mining
IEEE Intelligent Systems
Data & Knowledge Engineering
Evaluation of contextual information retrieval effectiveness: overview of issues and research
Knowledge and Information Systems
Relevance of contextual information in compression-based text clustering
IDEAL'10 Proceedings of the 11th international conference on Intelligent data engineering and automated learning
Reducing the Loss of Information through Annealing Text Distortion
IEEE Transactions on Knowledge and Data Engineering
Audio-based context recognition
IEEE Transactions on Audio, Speech, and Language Processing
IEEE Transactions on Information Theory
IEEE Transactions on Information Theory
The Information Lost in Erasures
IEEE Transactions on Information Theory
Hi-index | 12.05 |
Usually, when analyzing data that have not been processed or filtered yet, it can be observed that not all the data have equal importance. Thus, it is common to find relevant data surrounded by non relevant one. This occurs when analyzing textual information due to its intrinsic nature: texts contain words that provide a lot of information about the subject matter, whereas they contain other words with a little meaning or relevance. We believe that although in principle the non-relevant words are not as important as the relevant ones, the former constitute the substrate that supports the last. Since this substrate is the context that surrounds the relevant information, we call it the contextual information. In this paper, we analyze the relevance that the contextual information has in textual data, in a clustering by compression scenario. We generate the contextual information applying a distortion technique previously developed by the authors. One of the main characteristics of this technique is that it maintains the contextual information. In this paper we compare this technique with three new distortion techniques that destroy the contextual information in different ways. The experimental results support our hypothesis that the contextual information is relevant at least in the area of text clustering by compression.