An algorithm for suffix stripping
Readings in information retrieval
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Novel Web Text Mining Method Using the Discrete Cosine Transform
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
A new implementation technique for fast Spectral based document retrieval systems
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Elements of Wavelets for Engineers and Scientists
Elements of Wavelets for Engineers and Scientists
Fourier Domain Scoring: A Novel Document Ranking Method
IEEE Transactions on Knowledge and Data Engineering
A Novel Document Ranking Method Using the Discrete Cosine Transform
IEEE Transactions on Pattern Analysis and Machine Intelligence
Hybrid Pre-Query Term Expansion using Latent Semantic Analysis
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Bayesian network model for semi-structured document classification
Information Processing and Management: an International Journal - Special issue: Bayesian networks and information retrieval
A novel document retrieval method using the discrete wavelet transform
ACM Transactions on Information Systems (TOIS)
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
On Textual Documents Classification Using Fourier Domain Scoring
WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Exploiting structural information for semi-structured document categorization
Information Processing and Management: an International Journal
Hi-index | 0.00 |
Term signal is an existing text representation that depicts a term as a vector of frequencies of occurrences in a number of user-defined partitions of a document. Although term signal augments the traditional vector space model with patterns of term occurrences, its document division is not coherent with the actual logical structure of a document. In this paper, we propose a novel document model, termed Structure-Based Document Model with Discrete Wavelet Transforms (SDMDWT), that exploits the structural information of documents and mathematical transforms for document representation. The proposed SDMDWT model enhances the existing term signal concept by additionally taking into consideration document's structural information during document division. We evaluated the proposed model on two different domains of standard data sets, WebKB 4-Universities and TREC Genomics 2005, using Support Vector Machines binary classification. The experimental results show that using our SDMDWT model for document representation demonstrates promising improvements of classification performances over existing document models.