Athena: Mining-Based Interactive Management of Text Database

Authors:
Rakesh Agrawal;Roberto J. Bayardo, Jr.;Ramakrishnan Srikant
Affiliations:
-;-;-
Venue:
EDBT '00 Proceedings of the 7th International Conference on Extending Database Technology: Advances in Database Technology
Year:
2000

Citing 12
Cited 13

Clustering algorithms

Information retrieval
Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Agents that reduce work and information overload

Communications of the ACM
Automated learning of decision rules for text categorization

ACM Transactions on Information Systems (TOIS)
Learning and Revising User Profiles: The Identification ofInteresting Web Sites

Machine Learning - Special issue on multistrategy learning
SONIA: a service for organizing networked information autonomously

Proceedings of the third ACM conference on Digital libraries
MailCat: an intelligent assistant for organizing e-mail

Proceedings of the third annual conference on Autonomous Agents
Machine Learning

Machine Learning
Athena: Mining-Based Interactive Management of Text Database

EDBT '00 Proceedings of the 7th International Conference on Extending Database Technology: Advances in Database Technology
Improving Text Classification by Shrinkage in a Hierarchy of Classes

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
SPRINT: A Scalable Parallel Classifier for Data Mining

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Using Taxonomy, Discriminants, and Signatures for Navigating in Text Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases

Athena: Mining-Based Interactive Management of Text Database

EDBT '00 Proceedings of the 7th International Conference on Extending Database Technology: Advances in Database Technology
Building Text Classifiers Using Positive and Unlabeled Examples

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Cross-lingual text categorization: Conquering language boundaries in globalized environments

Information Processing and Management: an International Journal
How to count thumb-ups and thumb-downs: user-rating based ranking of items from an axiomatic perspective

ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
Exploiting attribute redundancy for web entity data extraction

ICADL'11 Proceedings of the 13th international conference on Asia-pacific digital libraries: for cultural heritage, knowledge dissemination, and future creation
Learning to separate text content and style for classification

AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Refinement method of post-processing and training for improvement of automated text classification

ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part II
On text mining algorithms for automated maintenance of hierarchical knowledge directory

KSEM'06 Proceedings of the First international conference on Knowledge Science, Engineering and Management
Efficient classification method for complex biological literature using text and data mining combination

IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning
A cost-sensitive technique for positive-example learning supporting content-based product recommendations in B-to-C e-commerce

Decision Support Systems
Confidence-Based incremental classification for objects with limited attributes in vertical search

IEA/AIE'12 Proceedings of the 25th international conference on Industrial Engineering and Other Applications of Applied Intelligent Systems: advanced research in applied artificial intelligence
Genetic optimized artificial immune system in spam detection: a review and a model

Artificial Intelligence Review
Exploiting poly-lingual documents for improving text categorization effectiveness

Decision Support Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe Athena: a system for creating, exploiting, and maintaining a hierarchy of textual documents through interactive mining-based operations. Requirements of any such system include speed and minimal end-user effort. Athena satisfies these requirements through linear-time classification and clustering engines which are applied interactively to speed the development of accurate models. Naive Bayes classifiers are recognized to be among the best for classifying text. We show that our specialization of the Naive Bayes classifier is considerably more accurate (7 to 29% absolute increase in accuracy) than a standard implementation. Our enhancements include using Lid-stone's law of succession instead of Laplace's law, under-weighting long documents, and over-weighting author and subject. We also present a new interactive clustering algorithm, C-Evolve, for topic discovery. C-Evolve first finds highly accurate cluster digests (partial clusters), gets user feedback to merge and correct these digests, and then uses the classification algorithm to complete the partitioning of the data. By allowing this interactivity in the clustering process, C-Evolve achieves considerably higher clustering accuracy (10 to 20% absolute increase in our experiments) than the popular K-Means and agglomerative clustering methods.