Web-Based Document Classification Using a Trie-Based Index Structure

Authors:
Jeahyun Park;Juyoung Park;Joongmin Choi
Affiliations:
-;-;-
Venue:
WI-IATW '07 Proceedings of the 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Workshops
Year:
2007

Citing 5
Cited 0

Variable-depth trie index optimization: theory and experimental results

ACM Transactions on Database Systems (TODS)
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Modern Information Retrieval

Modern Information Retrieval
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

An automatic document classification system is useful to manage the massive quantities of documents such as the Web document collection. However, its complicated process of classification has become a serious problem when applying it to general services. In this paper, we suggest an efficient data structure for the document classification and develop a classification system based on a trie-based index structure. This efficient data structure reduces overheads for the task of document classification using naive Bayesian probabilistic models and makes it possible to implement commercial applications. In our system, both learning and classification are performed in a Web-based user interface rather than by a remote application, which contributes to achieve easy control of the classification process and the flexibility of diverse document provision.