Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Introduction to statistical pattern recognition (2nd ed.)
Introduction to statistical pattern recognition (2nd ed.)
Evaluating text categorization
HLT '91 Proceedings of the workshop on Speech and Natural Language
Machine learning: a theoretical approach
Machine learning: a theoretical approach
Information retrieval: data structures and algorithms
Information retrieval: data structures and algorithms
Elements of information theory
Elements of information theory
Using collaborative filtering to weave an information tapestry
Communications of the ACM - Special issue on information filtering
C4.5: programs for machine learning
C4.5: programs for machine learning
Constant interaction-time scatter/gather browsing of very large document collections
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Using WordNet to disambiguate word senses for text retrieval
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Elements of machine learning
A comparison of classifiers and document representations for the routing problem
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
A survey of information retrieval and filtering methods
A survey of information retrieval and filtering methods
Social information filtering: algorithms for automating “word of mouth”
CHI '95 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Artificial Neural Networks: A Tutorial
Computer - Special issue: neural computing: companion issue to Spring 1996 IEEE Computational Science & Engineering
The space complexity of approximating the frequency moments
STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Context-sensitive learning methods for text categorization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Fab: content-based, collaborative recommendation
Communications of the ACM
Exploiting clustering and phrases for context-based information retrieval
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Enhanced hypertext categorization using hyperlinks
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Information retrieval algorithms: a survey
SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
Information Retrieval
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality
Data Mining and Knowledge Discovery
Feature Subset Selection in Text-Learning
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Hierarchically Classifying Documents Using Very Few Words
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Using Taxonomy, Discriminants, and Signatures for Navigating in Text Databases
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Experiences with GroupLens: marking usenet useful again
ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference
On the merits of building categorization systems by supervised clustering
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Bringing order to the Web: automatically categorizing search results
Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Hierarchical classification of Web content
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Recent results in automatic Web resource discovery
ACM Computing Surveys (CSUR)
A classifier for semi-structured documents
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Optimizing search by showing results in context
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Multiple related document summary and navigation using concept hierarchies for mobile clients
Proceedings of the 2002 ACM symposium on Applied computing
Accelerated focused crawling through online relevance feedback
Proceedings of the 11th international conference on World Wide Web
Web montage: a dynamic personalized start page
Proceedings of the 11th international conference on World Wide Web
Building Hierarchical Classifiers Using Class Proximity
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Distributed Hypertext Resource Discovery Through Examples
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
The Web in 2010: Challenges and Opportunities for Database Research
Informatics - 10 Years Back. 10 Years Ahead.
Multiclassifier Systems: Back to the Future
MCS '02 Proceedings of the Third International Workshop on Multiple Classifier Systems
Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Scaling multi-class support vector machines using inter-class confusion
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient Web Access to Distributed Biological Collections Using a Taxonomy Browser
SSDBM '00 Proceedings of the 12th International Conference on Scientific and Statistical Database Management
Data mining for hypertext: a tutorial survey
ACM SIGKDD Explorations Newsletter
Fast and accurate text classification via multiple linear discriminant projections
The VLDB Journal — The International Journal on Very Large Data Bases
Enhancing Techniques for Efficient Topic Hierarchy Integration
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
On Using Partial Supervision for Text Categorization
IEEE Transactions on Knowledge and Data Engineering
Approximation algorithms for a hierarchically structured bin packing problem
Information Processing Letters
TopCat: Data Mining for Topic Identification in a Text Corpus
IEEE Transactions on Knowledge and Data Engineering
Automatically learning document taxonomies for hierarchical classification
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Hierarchical document categorization with k-NN and concept-based thesauri
Information Processing and Management: an International Journal
Acclimatizing Taxonomic Semantics for Hierarchical Content Classification
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Effective and efficient classification on a search-engine model
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
A review of ontology based query expansion
Information Processing and Management: an International Journal
Topic taxonomy adaptation for group profiling
ACM Transactions on Knowledge Discovery from Data (TKDD)
Deep classifier: automatically categorizing search results into large-scale hierarchies
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Enhanced hierarchical classification via isotonic smoothing
Proceedings of the 17th international conference on World Wide Web
Boosting multi-label hierarchical text categorization
Information Retrieval
Deep classification in large-scale text hierarchies
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Effective and efficient classification on a search-engine model
Knowledge and Information Systems
Towards the Automatic Construction of Conceptual Taxonomies
DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
Two novel feature selection approaches for web page classification
Expert Systems with Applications: An International Journal
Class dependent feature scaling method using naive Bayes classifier for text datamining
Pattern Recognition Letters
Large scale multi-label classification via metalabeler
Proceedings of the 18th international conference on World wide web
A survey of Web clustering engines
ACM Computing Surveys (CSUR)
A maximum likelihood framework for integrating taxonomies
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies
Journal of the ACM (JACM)
Hierarchical document categorization with k-NN and concept-based thesauri
Information Processing and Management: an International Journal
Creating and visualizing fuzzy document classification
SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
Combining global and local information for enhanced deep classification
Proceedings of the 2010 ACM Symposium on Applied Computing
Text and hypertext categorization
Artificial intelligence
A Study of Hierarchical and Flat Classification of Proteins
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Robust relief-feature weighting, margin maximization, and fuzzy optimization
IEEE Transactions on Fuzzy Systems
Building a dynamic classifier for large text data collections
ADC '10 Proceedings of the Twenty-First Australasian Conference on Database Technologies - Volume 104
A survey of hierarchical classification across different application domains
Data Mining and Knowledge Discovery
Text classification for data loss prevention
PETS'11 Proceedings of the 11th international conference on Privacy enhancing technologies
TreeBoost.MH: a boosting algorithm for multi-label hierarchical text categorization
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Classifying web data in directory structures
APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
Web directory construction using lexical chains
NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
NLDB'06 Proceedings of the 11th international conference on Applications of Natural Language to Information Systems
Feature selection for dimensionality reduction
SLSFS'05 Proceedings of the 2005 international conference on Subspace, Latent Structure and Feature Selection
Clustering and categorization of Brazilian portuguese legal documents
PROPOR'12 Proceedings of the 10th international conference on Computational Processing of the Portuguese Language
A three-phase method for patent classification
Information Processing and Management: an International Journal
Hierarchical classification of web documents by stratified discriminant analysis
IRFC'12 Proceedings of the 5th conference on Multidisciplinary Information Retrieval
Variable-constraint classification and quantification of radiology reports under the ACR Index
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
We explore how to organize large text databases hierarchically by topic to aid better searching, browsing and filtering. Many corpora, such as internet directories, digital libraries, and patent databases are manually organized into topic hierarchies, also called taxonomies. Similar to indices for relational data, taxonomies make search and access more efficient. However, the exponential growth in the volume of on-line textual information makes it nearly impossible to maintain such taxonomic organization for large, fast-changing corpora by hand. We describe an automatic system that starts with a small sample of the corpus in which topics have been assigned by hand, and then updates the database with new documents as the corpus grows, assigning topics to these new documents with high speed and accuracy. To do this, we use techniques from statistical pattern recognition to efficiently separate the feature words, or discriminants, from thenoise words at each node of the taxonomy. Using these, we build a multilevel classifier. At each node, this classifier can ignore the large number of “noise” words in a document. Thus, the classifier has a small model size and is very fast. Owing to the use of context-sensitive features, the classifier is very accurate. As a by-product, we can compute for each document a set of terms that occur significantly more often in it than in the classes to which it belongs. We describe the design and implementation of our system, stressing how to exploit standard, efficient relational operations like sorts and joins. We report on experiences with the Reuters newswire benchmark, the US patent database, and web document samples from Yahoo!. We discuss applications where our system can improve searching and filtering capabilities.