Coding and information theory (2nd ed.)
Coding and information theory (2nd ed.)
Models for retrieval with probabilistic indexing
Information Processing and Management: an International Journal - Modeling data, information and knowledge
Term clustering of syntactic phrases
SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating text categorization
HLT '91 Proceedings of the workshop on Speech and Natural Language
CONSTRUE/TIS: A System for Content-Based Indexing of a Database of News Stories
IAAI '90 Proceedings of the The Second Conference on Innovative Applications of Artificial Intelligence
Representation and Learning in Information Retrieval
Representation and Learning in Information Retrieval
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
A stochastic parts program and noun phrase parser for unrestricted text
ANLC '88 Proceedings of the second conference on Applied natural language processing
Overview of the third message understanding evaluation and conference
MUC3 '91 Proceedings of the 3rd conference on Message understanding
Data extraction as text categorization: an experiment with the MUC-3 corpus
MUC3 '91 Proceedings of the 3rd conference on Message understanding
Automated learning of decision rules for text categorization
ACM Transactions on Information Systems (TOIS)
Towards language independent automated learning of text categorization models
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Partial orders for document representation: a new methodology for combining document features
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Information extraction for Thai documents
IRAL '00 Proceedings of the fifth international workshop on on Information retrieval with Asian languages
A Combined Latent Class and Trait Model for the Analysis and Visualization of Discrete Data
IEEE Transactions on Pattern Analysis and Machine Intelligence
The use of bigrams to enhance text categorization
Information Processing and Management: an International Journal
Automatic Text Categorization and Its Application to Text Retrieval
IEEE Transactions on Knowledge and Data Engineering
Maximizing Text-Mining Performance
IEEE Intelligent Systems
Feature selection on hierarchy of web documents
Decision Support Systems - Web retrieval and mining
Concept Based Adaptive IR Model Using FCA-BAM Combination for Concept Representation and Encoding
Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval
Second Order Features for Maximising Text Classification Performance
EMCL '01 Proceedings of the 12th European Conference on Machine Learning
TWIMC: An Anonymous Recipient E-mail System
IEA/AIE '02 Proceedings of the 15th international conference on Industrial and engineering applications of artificial intelligence and expert systems: developments in applied artificial intelligence
Computer-Assisted Classification of Legal Abstracts
IDA '99 Proceedings of the Third International Symposium on Advances in Intelligent Data Analysis
Integrating feature and instance selection for text classification
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Exploiting sophisticated representations for document retrieval
ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Exploring the use of linguistic features in domain and genre classification
EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Document classification by machine: theory and practice
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
A multistrategy approach for digital text categorization from imbalanced documents
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Adaptive anti-spam filtering for agglutinative languages: a special case for Turkish
Pattern Recognition Letters
Best terms: an efficient feature-selection algorithm for text categorization
Knowledge and Information Systems
OCFS: optimal orthogonal centroid feature selection for text categorization
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic Category Theme Identification and Hierarchy Generation for Chinese Text Categorization
Journal of Intelligent Information Systems
Effective and Efficient Dimensionality Reduction for Large-Scale and Streaming Data Preprocessing
IEEE Transactions on Knowledge and Data Engineering
Evolving local and global weighting schemes in information retrieval
Information Retrieval
The Journal of Machine Learning Research
Architecture of a grid-enabled Web search engine
Information Processing and Management: an International Journal
Detection of e-mail concerning criminal activities using association rule-based decision tree
International Journal of Electronic Security and Digital Forensics
Flexible document categorisation
AIKED'05 Proceedings of the 4th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering Data Bases
MATH'07 Proceedings of the 12th WSEAS International Conference on Applied Mathematics
Journal of Computer Security
Two-Stage Model for Information Filtering
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
PicAChoo: a tool for customizable feature extraction utilizing characteristics of textual data
Proceedings of the 3rd International Conference on Ubiquitous Information Management and Communication
Journal of Computer Security - Best papers of the Sec Track at the 2006 ACM Symposium
Feature selection & dominant feature selection for product reviews using meta-heuristic algorithms
Proceedings of the Third Annual ACM Bangalore Conference
Text and hypertext categorization
Artificial intelligence
Document clustering using synthetic cluster prototypes
Data & Knowledge Engineering
Entropy based feature selection for text categorization
Proceedings of the 2011 ACM Symposium on Applied Computing
A new feature selection algorithm based on binomial hypothesis testing for spam filtering
Knowledge-Based Systems
An iterative voting method based on word density for text classification
Proceedings of the International Conference on Web Intelligence, Mining and Semantics
Correntropy based feature selection using binary projection
Pattern Recognition
Information Sciences: an International Journal
Robust feature selection by mutual information distributions
UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Automatic window design for gray-scale image processing based on entropy minimization
CIARP'05 Proceedings of the 10th Iberoamerican Congress conference on Progress in Pattern Recognition, Image Analysis and Applications
Finding uninformative features in binary data
IDEAL'05 Proceedings of the 6th international conference on Intelligent Data Engineering and Automated Learning
Text mining using markov chains of variable length
Proceedings of the 2005 international conference on Federation over the Web
BayesTH-MCRDR algorithm for automatic classification of web document
AI'04 Proceedings of the 17th Australian joint conference on Advances in Artificial Intelligence
Feature extraction for learning to classify questions
AI'04 Proceedings of the 17th Australian joint conference on Advances in Artificial Intelligence
Feature selection for dimensionality reduction
SLSFS'05 Proceedings of the 2005 international conference on Subspace, Latent Structure and Feature Selection
The Journal of Machine Learning Research
Artificial Intelligence in Medicine
Document classification with supervised latent feature selection
Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics
Hi-index | 0.00 |
The effect of selecting varying numbers and kinds of features for use in predicting category membership was investigated on the Reuters and MUC-3 text categorization data sets. Good categorization performance was achieved using a statistical classifier and a proportional assignment strategy. The optimal feature set size for word-based indexing was found to be surprisingly low (10 to 15 features) despite the large training sets. The extraction of new text features by syntactic analysis and feature clustering was investigated on the Reuters data set. Syntactic indexing phrases, clusters of these phrases, and clusters of words were all found to provide less effective representations than individual words.