The Ferrety algorithm for the KDD Cup 2005 problem
ACM SIGKDD Explorations Newsletter
Classifying web documents in a hierarchy of categories: a comprehensive study
Journal of Intelligent Information Systems
ACM SIGKDD Explorations Newsletter
Topic and language specific internet search engine
Acta Cybernetica
Preferential text classification: learning algorithms and evaluation measures
Information Retrieval
Patent classification system using a new hybrid genetic algorithm support vector machine
Applied Soft Computing
A survey of hierarchical classification across different application domains
Data Mining and Knowledge Discovery
Hi-index | 0.00 |
Text categorization is the classification to assign a textdocument to an appropriate category in a predefined setof categories. This paper focuses on the special case whencategories are organized in hierarchy. We presents a newapproach on this recently emerged subfield of text categorization.The algorithm applies an iterative learning modulethat allow of gradually creating a classifier by trial-and-error-like method. We present a software that has beendeveloped on the basis of the algorithm to illustrate thecapability of the algorithm on large data collection. Weexperimented on the very large benchmark collection, onthe WIPO-alpha (World Intellectual Property Organization,Geneva, Switzerland, 2002) English patent database thatconsists of about 75000 XML documents distributed over5000 categories. Our software is able to index the corpusquickly and creates a classifier in a few iteration cycle. Wepresent the results achieved by the classifier w.r.t. varioustest setting.