A parametric methodology for text classification

Authors:
Nikitas N. Karanikolas;Christos Skourlas
Affiliations:
Department of Informatics, Technological EducationalInstitute (TEI) of Athens, Athens, Greece;Department of Informatics, Technological EducationalInstitute (TEI) of Athens, Athens, Greece
Venue:
Journal of Information Science
Year:
2010

Citing 21
Cited 5

A document retrieval system based on nearest neighbour searching

Journal of Information Science
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Information Retrieval Systems: Theory and Implementation

Information Retrieval Systems: Theory and Implementation
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Clustering and Classification in Structured Data Domains Using Fuzzy Lattice Neurocomputing (FLN)

IEEE Transactions on Knowledge and Data Engineering
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
A Comparison of Word- and Sense-Based Text Categorization Using Several Classification Algorithms

Journal of Intelligent Information Systems
A morphological processor for Modern Greek

EACL '87 Proceedings of the third conference on European chapter of the Association for Computational Linguistics
Narrative text classification for automatic key phrase extraction in web document corpora

Proceedings of the 7th annual ACM international workshop on Web information and data management
Evolutionary learning of document categories

Information Retrieval
Engineering and utilizing a stopword list in Greek Web retrieval

Journal of the American Society for Information Science and Technology
Machine learning method for knowledge discovery experimented with otoneurological data

Computer Methods and Programs in Biomedicine
Statistical Identification of Key Phrases for Text Classification

MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
Greek-English Cross Language Retrieval of Medical Information

Medical Imaging and Informatics
KP-Miner: A keyphrase extraction system for English and Arabic documents

Information Systems
Single document keyphrase extraction using neighborhood knowledge

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Domain-specific keyphrase extraction

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
An ontology-based approach for key phrase extraction

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Bootstrapping the Albanian Information Retrieval

BCI '09 Proceedings of the 2009 Fourth Balkan Conference in Informatics
Keyphrase extraction in scientific publications

ICADL'07 Proceedings of the 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers
Psychiatric Consultation Record Retrieval Using Scenario-Based Representation and Multilevel Mixture Model

IEEE Transactions on Information Technology in Biomedicine

An alternative approach for statistical single-label document classification of newspaper articles

Journal of Information Science
Vector space model for patent documents with hierarchical class labels

Journal of Information Science
A software tool for building a statistical prefix processor

Proceedings of the Fifth Balkan Conference in Informatics
On the effect of stemming algorithms on extractive summarization: a case study

Proceedings of the 17th Panhellenic Conference on Informatics
Web page and image semi-supervised classification with heterogeneous information fusion

Journal of Information Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

Finding the correct category (class) a new unclassified document belongs to is an interesting and difficult problem, with a wide range of applications. Our methodology for narrative text classification is based on two techniques: we calculate the distance (similarity) between the new unclassified document and all the pre-classified documents of each class and also calculate the similarity of the new document to the â聙聵average class documentâ聙聶 of each class. In both cases we use key phrases (text phrases or key terms) as the distinctive features of our text classification methodology and eventually the proposed text classification method is based on the automatic extraction of an authority list of key phrases that is appropriate for discriminating between different classes. In this paper, we apply this methodology in handling Greek text and we present the key concepts, the algorithms, and some critical decisions. A number of parameters of the mining algorithm are also fine tuned. The actual text classification system, the adopted (embedded) ideas and the alternative values of parameters are evaluated using two training sets (test collections).