Document Categorization and Query Generation on the World Wide WebUsing WebACE

Authors:
Daniel Boley;Maria Gini;Robert Gross;Eui-Hong (Sam) Han;Kyle Hastings;George Karypis;Vipin Kumar;Bamshad Mobasher;Jerome Moore
Affiliations:
Department of Computer Science and Engineering, University of Minnesota, 4-192 EE/CSci Building, 200 Union Street SE Minneapolis, MN 55455, USA;Department of Computer Science and Engineering, University of Minnesota, 4-192 EE/CSci Building, 200 Union Street SE Minneapolis, MN 55455, USA;Department of Computer Science and Engineering, University of Minnesota, 4-192 EE/CSci Building, 200 Union Street SE Minneapolis, MN 55455, USA;Department of Computer Science and Engineering, University of Minnesota, 4-192 EE/CSci Building, 200 Union Street SE Minneapolis, MN 55455, USA;Department of Computer Science and Engineering, University of Minnesota, 4-192 EE/CSci Building, 200 Union Street SE Minneapolis, MN 55455, USA;Department of Computer Science and Engineering, University of Minnesota, 4-192 EE/CSci Building, 200 Union Street SE Minneapolis, MN 55455, USA;Department of Computer Science and Engineering, University of Minnesota, 4-192 EE/CSci Building, 200 Union Street SE Minneapolis, MN 55455, USA;Department of Computer Science and Engineering, University of Minnesota, 4-192 EE/CSci Building, 200 Union Street SE Minneapolis, MN 55455, USA;Department of Computer Science and Engineering, University of Minnesota, 4-192 EE/CSci Building, 200 Union Street SE Minneapolis, MN 55455, USA
Venue:
Artificial Intelligence Review - Special issue on data mining on the Internet
Year:
1999

Citing 0
Cited 50

Unsupervised updating of a classification tree in a dynamic environment

Proceedings of the third annual conference on Autonomous Agents
Automatic personalization based on Web usage mining

Communications of the ACM
Textual data mining of service center call records

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Fast supervised dimensionality reduction algorithm with applications to document categorization & retrieval

Proceedings of the ninth international conference on Information and knowledge management
Co-clustering documents and words using bipartite spectral graph partitioning

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Mining the web to create minority language corpora

Proceedings of the tenth international conference on Information and knowledge management
Evaluation of hierarchical clustering algorithms for document datasets

Proceedings of the eleventh international conference on Information and knowledge management
Principal Direction Divisive Partitioning

Data Mining and Knowledge Discovery
Discovery and Evaluation of Aggregate Usage Profiles for Web Personalization

Data Mining and Knowledge Discovery
Concept Decompositions for Large Sparse Text Data Using Clustering

Machine Learning
Chameleon: Hierarchical Clustering Using Dynamic Modeling

Computer
Centroid-Based Document Classification: Analysis and Experimental Results

PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Online Learning for Web Query Generation: Finding Documents Matching a Minority Concept on the Web

WI '01 Proceedings of the First Asia-Pacific Conference on Web Intelligence: Research and Development
Using Explicit, A Priori Contextual Knowledge in an Intelligent Web Search Agent

CONTEXT '01 Proceedings of the Third International and Interdisciplinary Conference on Modeling and Using Context
Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification

PAKDD '01 Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining
A Data-Clustering Algorithm on Distributed Memory Multiprocessors

Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
Performing Binary-Categorization on Multiple-Record Web Documents Using Information Retrieval Models and Application Ontologies

World Wide Web
Using web helper agent profiles in query generation

AAMAS '03 Proceedings of the second international joint conference on Autonomous agents and multiagent systems
Intelligent metasearch engine for knowledge management

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Document clustering via adaptive subspace iteration

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient Phrase-Based Document Indexing for Web Document Clustering

IEEE Transactions on Knowledge and Data Engineering
QueryTracker: An Agent for Tracking Persistent Information Needs

AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 1
Building Minority Language Corpora by Learning to Generate Web Search Queries

Knowledge and Information Systems
The BankSearch web document dataset: investigating unsupervised clustering and category similarity

Journal of Network and Computer Applications - Special issue on computational intelligence on the internet
Intelligent web traffic mining and analysis

Journal of Network and Computer Applications - Special issue on computational intelligence on the internet
Hierarchical Clustering Algorithms for Document Datasets

Data Mining and Knowledge Discovery
A general model for clustering binary data

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
A parallel hybrid web document clustering algorithm and its performance study

The Journal of Supercomputing - Special issue: Parallel and distributed processing and applications
Discover the semantic topology in high-dimensional data

Expert Systems with Applications: An International Journal
On the relationships between user profiles and navigation sessions in virtual communities: A data-mining approach

Intelligent Data Analysis
A comparative analysis on the bisecting K-means and the PDDP clustering algorithms

Intelligent Data Analysis
Algorithms for clustering high dimensional and distributed data

Intelligent Data Analysis
Bipartite isoperimetric graph partitioning for data co-clustering

Data Mining and Knowledge Discovery
Hypergraph partitioning for document clustering: a unified clique perspective

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Distributed collaborative Web document clustering using cluster keyphrase summaries

Information Fusion
Ensemble document clustering using weighted hypergraph generated by NMF

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
A simplicial complex, a hypergraph, structure in the latent semantic space of document clustering

International Journal of Approximate Reasoning
Mining Indirect Association Rules for Web Recommendation

International Journal of Applied Mathematics and Computer Science
ROSA: multi-agent system for web services personalization

AWIC'03 Proceedings of the 1st international Atlantic web intelligence conference on Advances in web intelligence
Semantic based real-time clustering for PubMed literatures

DS'07 Proceedings of the 10th international conference on Discovery science
Fast categorization of web documents represented by graphs

WebKDD'06 Proceedings of the 8th Knowledge discovery on the web international conference on Advances in web mining and web usage analysis
Automatic query generation and query relevance measurement for unsupervised language model adaptation of speech recognition

EURASIP Journal on Audio, Speech, and Music Processing
Improving document clustering using Okapi BM25 feature weighting

Information Retrieval
Document mining based on semantic understanding of text

CIARP'06 Proceedings of the 11th Iberoamerican conference on Progress in Pattern Recognition, Image Analysis and Applications
Succinct initialization methods for clustering algorithms

ICIC'11 Proceedings of the 7th international conference on Advanced Intelligent Computing
Efficient stochastic algorithms for document clustering

Information Sciences: an International Journal
Construction of Domain Ontologies: Sourcing the World Wide Web

International Journal of Intelligent Information Technologies
Effective measures for inter-document similarity

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
On Knowledge-Enhanced Document Clustering

International Journal of Information Retrieval Research
Boosting for multiclass semi-supervised learning

Pattern Recognition Letters

Quantified Score

Hi-index	0.02

Visualization

Abstract

We present WebACE, an agent for exploring and categorizing documents onthe World Wide Web based on a user profile. The heart of the agent is anunsupervised categorization of a set of documents, combined with a processfor generating new queries that is used to search for new relateddocuments and for filtering the resulting documents to extract the onesmost closely related to the starting set. The document categories are notgiven a priori. We present the overall architecture and describe twonovel algorithms which provide significant improvement over HierarchicalAgglomeration Clustering and AutoClass algorithms and form the basis forthe query generation and search component of the agent. We report on theresults of our experiments comparing these new algorithms with moretraditional clustering algorithms and we show that our algorithms are fastand sacalable.