Automatic text processing: the transformation, analysis, and retrieval of information by computer
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Word association norms, mutual information, and lexicography
Computational Linguistics
Text representation for intelligent text retrieval: a classification-oriented view
Text-based intelligent systems
An evaluation of phrasal and clustered representations on a text categorization task
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
The nature of statistical learning theory
The nature of statistical learning theory
Document classification using multiword features
Proceedings of the seventh international conference on Information and knowledge management
A semidiscrete matrix decomposition for latent semantic indexing information retrieval
ACM Transactions on Information Systems (TOIS)
Foundations of statistical natural language processing
Foundations of statistical natural language processing
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Text databases & document management
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Ontologies Improve Text Document Clustering
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Document Clustering Using Locality Preserving Indexing
IEEE Transactions on Knowledge and Data Engineering
Identifying Multi-Word Terms by Text-Segments
WAIMW '06 Proceedings of the Seventh International Conference on Web-Age Information Management Workshops
Text document clustering based on frequent word meaning sequences
Data & Knowledge Engineering
Augmenting the power of LSI in text retrieval: Singular value rescaling
Data & Knowledge Engineering
Text classification based on multi-word with support vector machine
Knowledge-Based Systems
Using ontology to improve precision of terminology extraction from documents
Expert Systems with Applications: An International Journal
Text Mining: Predictive Methods for Analyzing Unstructured Information
Text Mining: Predictive Methods for Analyzing Unstructured Information
AICI'11 Proceedings of the Third international conference on Artificial intelligence and computational intelligence - Volume Part III
A multi-classifier system for text categorization
Proceedings of the 2011 ACM Symposium on Research in Applied Computation
Automated approaches for detecting integration in student essays
ITS'12 Proceedings of the 11th international conference on Intelligent Tutoring Systems
A generalized cluster centroid based classifier for text categorization
Information Processing and Management: an International Journal
Recognition of word collocation habits using frequency rank ratio and inter-term intimacy
Expert Systems with Applications: An International Journal
Clustering Software Components for Component Reuse and Program Restructuring
Proceedings of the Second International Conference on Innovative Computing and Cloud Computing
Expert Systems with Applications: An International Journal
Enhanced cross-domain document clustering with a semantically enhanced text stemmer SETS
International Journal of Knowledge-based and Intelligent Engineering Systems - Selected papers of KES2012-Part 2 of 2
Hi-index | 12.05 |
One of the main themes in text mining is text representation, which is fundamental and indispensable for text-based intellegent information processing. Generally, text representation inludes two tasks: indexing and weighting. This paper has comparatively studied TF*IDF, LSI and multi-word for text representation. We used a Chinese and an English document collection to respectively evaluate the three methods in information retreival and text categorization. Experimental results have demonstrated that in text categorization, LSI has better performance than other methods in both document collections. Also, LSI has produced the best performance in retrieving English documents. This outcome has shown that LSI has both favorable semantic and statistical quality and is different with the claim that LSI can not produce discriminative power for indexing.