Fuzzy information retrieval based on a fuzzy pseudothesaurus
IEEE Transactions on Systems, Man and Cybernetics
A fuzzy document retrieval system using the keyword connection matrix and a learning method
Fuzzy Sets and Systems - Special issue on applications of fuzzy systems theory, Iizuka '88
Information retrieval
Enhanced hypertext categorization using hyperlinks
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
Bringing order to the Web: automatically categorizing search results
Proceedings of the SIGCHI conference on Human Factors in Computing Systems
A vector space model for automatic indexing
Communications of the ACM
Function-based object model towards website adaptation
Proceedings of the 10th international conference on World Wide Web
Seeing the whole in parts: text summarization for web browsing on handheld devices
Proceedings of the 10th international conference on World Wide Web
Summarization as feature selection for text categorization
Proceedings of the tenth international conference on Information and knowledge management
Transductive Inference for Text Classification using Support Vector Machines
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Web Mining: Information and Pattern Discovery on the World Wide Web
ICTAI '97 Proceedings of the 9th International Conference on Tools with Artificial Intelligence
Automatic text categorization using the importance of sentences
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
The SMART Retrieval System—Experiments in Automatic Document Processing
The SMART Retrieval System—Experiments in Automatic Document Processing
A text categorization based on summarization technique
RANLPIR '00 Proceedings of the ACL-2000 workshop on Recent advances in natural language processing and information retrieval: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 11
Integrating image data into biomedical text categorization
Bioinformatics
Substring selection for biomedical document classification
Bioinformatics
Exploring a new space of features for document classification: figure clustering
CASCON '06 Proceedings of the 2006 conference of the Center for Advanced Studies on Collaborative research
Mining longest repeating subsequences to predict world wide web surfing
USITS'99 Proceedings of the 2nd conference on USENIX Symposium on Internet Technologies and Systems - Volume 2
International Journal on Document Analysis and Recognition
Hi-index | 0.00 |
In this paper a new classification algorithm of Web documents into a set of categories, is proposed. The proposed technique is based on analyzing relationships between different documents and the terms they contain by producing a set of rules relating the category of the document, its terms and their frequencies. Each document is represented by a graph that correlates its most frequent combined words and its category. The relationships among these graphs and the documents' categories are captured. The proposed technique has three phases. The first phase is a training phase where human experts determines the categories of different web pages and articles and the supervised classification algorithm will combine these categories with appropriate weighted index terms according to the highest supported rules among the most frequent words. The second phase is the blind categorization phase where a web crawler will crawl through the World Wide Web to build a database that will be categorized according to the result of the first phase. This data base contains URLs and their categories. The third phase is applying the proposed graph representation technique on the whole set of documents per category to determine its final graph representation. The third phase will produce better classification rules because the sample size is larger with no additional cost of supervised categorization. Experiments using data sets collected from different Web portals are conducted.