Experiment with a hierarchical text categorization method on the WIPO-alpha patent collection

Authors:
Domonkos Tikk;György Biró
Affiliations:
-;-
Venue:
ISUMA '03 Proceedings of the 4th International Symposium on Uncertainty Modelling and Analysis
Year:
2003

Citing 0
Cited 7

The Ferrety algorithm for the KDD Cup 2005 problem

ACM SIGKDD Explorations Newsletter
Classifying web documents in a hierarchy of categories: a comprehensive study

Journal of Intelligent Information Systems
Voting with a parameterized veto strategy: solving the KDD Cup 2006 problem by means of a classifier committee

ACM SIGKDD Explorations Newsletter
Topic and language specific internet search engine

Acta Cybernetica
Preferential text classification: learning algorithms and evaluation measures

Information Retrieval
Patent classification system using a new hybrid genetic algorithm support vector machine

Applied Soft Computing
A survey of hierarchical classification across different application domains

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text categorization is the classification to assign a textdocument to an appropriate category in a predefined setof categories. This paper focuses on the special case whencategories are organized in hierarchy. We presents a newapproach on this recently emerged subfield of text categorization.The algorithm applies an iterative learning modulethat allow of gradually creating a classifier by trial-and-error-like method. We present a software that has beendeveloped on the basis of the algorithm to illustrate thecapability of the algorithm on large data collection. Weexperimented on the very large benchmark collection, onthe WIPO-alpha (World Intellectual Property Organization,Geneva, Switzerland, 2002) English patent database thatconsists of about 75000 XML documents distributed over5000 categories. Our software is able to index the corpusquickly and creates a classifier in a few iteration cycle. Wepresent the results achieved by the classifier w.r.t. varioustest setting.