Automatic text processing
Automated learning of decision rules for text categorization
ACM Transactions on Information Systems (TOIS)
OHSUMED: an interactive retrieval evaluation and new large test collection for research
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Document filtering for fast ranking
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Overview of the second text retrieval conference (TREC-2)
TREC-2 Proceedings of the second conference on Text retrieval conference
Simulation of compressible flow on a massively parallel architecture
Scientific Programming - On applications analysis
A comparison of classifiers and document representations for the routing problem
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Filtered document retrieval with frequency-sorted indexes
Journal of the American Society for Information Science
Self-indexing inverted files for fast text retrieval
ACM Transactions on Information Systems (TOIS)
Information Processing and Management: an International Journal - Special issue: history of information science
Training algorithms for linear text classifiers
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Context-sensitive learning methods for text categorization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Exploring the similarity space
ACM SIGIR Forum
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Vector-space ranking with effective early termination
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Information Retrieval
Impact transformation: effective and efficient web retrieval
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Novelty and redundancy detection in adaptive filtering
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Bayesian online classifiers for text classification and filtering
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
A new family of online algorithms for category ranking
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Compression of inverted indexes For fast query evaluation
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
The effect of topic set size on retrieval experiment error
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
How Many Bits are Needed to Store Term Frequencies?
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
The SMART Retrieval System—Experiments in Automatic Document Processing
The SMART Retrieval System—Experiments in Automatic Document Processing
Index construction for linear categorisation
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Compressed data structures for annotated web search
Proceedings of the 21st international conference on World Wide Web
Scalable text classification with sparse generative modeling
PRICAI'12 Proceedings of the 12th Pacific Rim international conference on Trends in Artificial Intelligence
Hi-index | 0.00 |
Automatic categorisation is an important technique for the management of large document collections. Categorisation can be used to store or locate documents that satisfy an information need when the need cannot be expressed as a concise list of query terms. Inverted indexes are used in all query-based retrieval systems to allow efficient query processing. In this paper, we propose the application of inverted indexes to categorisation with the aim of developing a fast, scalable, and accurate approach. Specifically, we propose successful variants of inverted indexing to reduce index size: first, quantisation of term-category weights; second, compression of the quantised weights; and, last, storing only those weights that significantly impact the categorisation process. We show that our techniques permits fast, accurate categorisation: index size is reduced by orders of magnitude compared to conventional inverted indexing and the accuracy of categorisation is preserved.