Instance-Based Learning Algorithms
Machine Learning
C4.5: programs for machine learning
C4.5: programs for machine learning
Little words can make a big difference for text classification
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Making large-scale support vector machine learning practical
Advances in kernel methods
A statistical learning learning model of text classification for support vector machines
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Text genre classification with genre-revealing and subject-revealing features
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Transductive Inference for Text Classification using Support Vector Machines
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification
PAKDD '01 Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining
A Method of Describing Document Contents through Topic Selection
SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
Support vector machine active learning with applications to text classification
The Journal of Machine Learning Research
Augmenting Naive Bayes Classifiers with Statistical Language Models
Information Retrieval
Automatic text categorization in terms of genre and author
Computational Linguistics
Automatic detection of text genre
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Language independent authorship attribution using character level language models
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Journal of the American Society for Information Science and Technology
One-class document classification via Neural Networks
Neurocomputing
Identifying Document Topics Using the Wikipedia Category Network
WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Proximity-based document representation for named entity retrieval
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Imbalanced text classification: A term weighting approach
Expert Systems with Applications: An International Journal
Positional language models for information retrieval
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Part-of-speech histograms for genre classification of text
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Extending lexical association measures for collocation extraction
Computer Speech and Language
Multiple sets of features for automatic genre classification of web documents
Information Processing and Management: an International Journal
A comparative study of TF*IDF, LSI and multi-words for text classification
Expert Systems with Applications: An International Journal
Syntax-Based Collocation Extraction
Syntax-Based Collocation Extraction
A semantic term weighting scheme for text categorization
Expert Systems with Applications: An International Journal
Estimating continuous distributions in Bayesian classifiers
UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
An unsupervised approach to feature discretization and selection
Pattern Recognition
Hi-index | 12.05 |
An effective algorithm for extracting two useful features from text documents for analyzing word collocation habits, ''Frequency Rank Ratio'' (FRR) and ''Intimacy'', is proposed. FRR is derived from a ranking index of a word according to its word frequency. Intimacy, computed by a compact language model called Influence Language Model (ILM), measures how close a word is to others within the same sentence. Using the proposed features, a visualization framework is developed for word collocation analysis. To evaluate our proposed framework, two corpora are designed and collected from the real-life data covering diverse topics and genres. Extensive simulations are conducted to illustrate the feasibility and effectiveness of our visualization framework. Our results demonstrate that the proposed features and algorithm are able to conduct reliable text analysis efficiently.