Recognition of word collocation habits using frequency rank ratio and inter-term intimacy

Authors:
Peng Tang;Tommy W. S. Chow
Affiliations:
Department of Electronic Engineering, City University of Hong Kong, Hong Kong;Department of Electronic Engineering, City University of Hong Kong, Hong Kong
Venue:
Expert Systems with Applications: An International Journal
Year:
2013

Citing 32
Cited 0

Instance-Based Learning Algorithms

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Little words can make a big difference for text classification

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Making large-scale support vector machine learning practical

Advances in kernel methods
A statistical learning learning model of text classification for support vector machines

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Text genre classification with genre-revealing and subject-revealing features

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification

PAKDD '01 Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining
A Method of Describing Document Contents through Topic Selection

SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
Support vector machine active learning with applications to text classification

The Journal of Machine Learning Research
Augmenting Naive Bayes Classifiers with Statistical Language Models

Information Retrieval
Automatic text categorization in terms of genre and author

Computational Linguistics
Automatic detection of text genre

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Language independent authorship attribution using character level language models

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Learning to classify documents according to genre: Special Topic Section on Computational Analysis of Style

Journal of the American Society for Information Science and Technology
One-class document classification via Neural Networks

Neurocomputing
Identifying Document Topics Using the Wikipedia Category Network

WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Proximity-based document representation for named entity retrieval

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
ConSOM: A conceptional self-organizing map model for text clustering

Neurocomputing
A quickly trainable hybrid SOM-based document organization system

Neurocomputing
Imbalanced text classification: A term weighting approach

Expert Systems with Applications: An International Journal
Positional language models for information retrieval

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Part-of-speech histograms for genre classification of text

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Extending lexical association measures for collocation extraction

Computer Speech and Language
Multiple sets of features for automatic genre classification of web documents

Information Processing and Management: an International Journal
Supervised feature selection by clustering using conditional mutual information-based distances

Pattern Recognition
A comparative study of TF*IDF, LSI and multi-words for text classification

Expert Systems with Applications: An International Journal
Syntax-Based Collocation Extraction

Syntax-Based Collocation Extraction
A semantic term weighting scheme for text categorization

Expert Systems with Applications: An International Journal
Estimating continuous distributions in Bayesian classifiers

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
An unsupervised approach to feature discretization and selection

Pattern Recognition
Solving multi-label text categorization problem using support vector machine approach with membership function

Neurocomputing

Quantified Score

Hi-index	12.05

Visualization

Abstract

An effective algorithm for extracting two useful features from text documents for analyzing word collocation habits, ''Frequency Rank Ratio'' (FRR) and ''Intimacy'', is proposed. FRR is derived from a ranking index of a word according to its word frequency. Intimacy, computed by a compact language model called Influence Language Model (ILM), measures how close a word is to others within the same sentence. Using the proposed features, a visualization framework is developed for word collocation analysis. To evaluate our proposed framework, two corpora are designed and collected from the real-life data covering diverse topics and genres. Extensive simulations are conducted to illustrate the feasibility and effectiveness of our visualization framework. Our results demonstrate that the proposed features and algorithm are able to conduct reliable text analysis efficiently.