Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
The nature of statistical learning theory
The nature of statistical learning theory
Training algorithms for linear text classifiers
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Summarizing text documents: sentence selection and evaluation metrics
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Learning to construct knowledge bases from the World Wide Web
Artificial Intelligence - Special issue on Intelligent internet systems
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Extended Boolean information retrieval
Communications of the ACM
A vector space model for automatic indexing
Communications of the ACM
A Study of Approaches to Hypertext Categorization
Journal of Intelligent Information Systems
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Document classification using a finite mixture model
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Automatic text categorization by unsupervised learning
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
NAACL-ANLP-AutoSum '00 Proceedings of the 2000 NAACL-ANLPWorkshop on Automatic summarization - Volume 4
Hybrid hill-climbing and knowledge-based methods for intelligent news filtering
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Using the feature projection technique based on a normalized voting method for text classification
Information Processing and Management: an International Journal
Information Processing and Management: an International Journal - Special issue: Cross-language information retrieval
Discovering "title-like" terms
Information Processing and Management: an International Journal
A study on automatically extracted keywords in text categorization
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Information Processing and Management: an International Journal
Keywords given by authors of scientific articles in database descriptors
Journal of the American Society for Information Science and Technology
Noise reduction through summarization for Web-page classification
Information Processing and Management: an International Journal
Information and Software Technology
Semantic text similarity using corpus-based word similarity and string similarity
ACM Transactions on Knowledge Discovery from Data (TKDD)
Semantic Text Classification of Emergent Disease Reports
PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Sentence similarity measurement based on shallow parsing
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 7
A short text modeling method combining semantic and statistical information
Information Sciences: an International Journal
Summarization as feature selection for document categorization on small datasets
IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing
Journal of Artificial Intelligence Research
A novel framework for web page classification using two-stage neural network
ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications
Improving Korean speech acts analysis by using shrinkage and discourse stack
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
PolyUCOMP: combining semantic vectors with skip bigrams for semantic textual similarity
SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
An integrated semantic-based approach in concept based video retrieval
Multimedia Tools and Applications
Hi-index | 0.00 |
Automatic text categorization is a problem of assigning text documents to pre-defined categories. In order to classify text documents, we must extract useful features. In previous researches, a text document is commonly represented by the term frequency and the inverted document frequency of each feature. Since there is a difference between important sentences and unimportant sentences in a document, the features from more important sentences should be considered more than other features. In this paper, we measure the importance of sentences using text summarization techniques. Then we represent a document as a vector of features with different weights according to the importance of each sentence. To verify our new method, we conduct experiments using two language newsgroup data sets: one written by English and the other written by Korean. Four kinds of classifiers are used in our experiments: Naive Bayes, Rocchio, k-NN, and SVM. We observe that our new method makes a significant improvement in all these classifiers and both data sets.