Fast construction of generalized suffix trees over a very large alphabet

Authors:
Zhixiang Chen;Richard Fowler;Ada Wai-Chee Fu;Chunyue Wang
Affiliations:
Department of Computer Science, University of Texas-Pan American, Edinburg, TX;Department of Computer Science, University of Texas-Pan American, Edinburg, TX;Department of Computer Science, Chinese University of Hong Kong, Shatin, N.T., Hong Kong;Department of Computer Science, University of Texas-Pan American, Edinburg, TX
Venue:
COCOON'03 Proceedings of the 9th annual international conference on Computing and combinatorics
Year:
2003

Citing 13
Cited 0

Characterizing browsing strategies in the World-Wide Web

Proceedings of the Third International World-Wide Web conference on Technology, tools and applications
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Discovering Internet marketing intelligence through online analytical web usage mining

ACM SIGMOD Record
A Space-Economical Suffix Tree Construction Algorithm

Journal of the ACM (JACM)
Web mining research: a survey

ACM SIGKDD Explorations Newsletter
Efficient Data Mining for Path Traversal Patterns

IEEE Transactions on Knowledge and Data Engineering
A Database Index to Large Biological Sequences

Proceedings of the 27th International Conference on Very Large Data Bases
Optimal Algorithms for Finding User Access Sessions from Very Large Web Logs

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Knowledge discovery from users Web-page navigation

RIDE '97 Proceedings of the 7th International Workshop on Research Issues in Data Engineering (RIDE '97) High Performance Database Management for Large-Scale Applications
Linear Time Algorithms for Finding Maximal Forward References

ITCC '03 Proceedings of the International Conference on Information Technology: Computers and Communications
WhatNext: A Prediction System for Web Requests using N-gram Sequence Models

WISE '00 Proceedings of the First International Conference on Web Information Systems Engineering (WISE'00)-Volume 1 - Volume 1
Mining longest repeating subsequences to predict world wide web surfing

USITS'99 Proceedings of the 2nd conference on USENIX Symposium on Internet Technologies and Systems - Volume 2
Linear pattern matching algorithms

SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The work in this paper is motivated by the real-world problems such as mining frequent traversal path patterns from very large Web logs. Generalized suffix trees over a very large alphabet can be used to solve such problems. However, traditional algorithms such as the Weiner, Ukkonen and McCreight algorithms are not sufficient assurance of practicality because of large magnitudes of the alphabet and the set of strings in those real-world problems. Two new algorithms are designed for fast construction of generalized suffix trees over a very large alphabet, and their performance is analyzed in comparison with the well-known Ukkonen algorithm. It is shown that these two algorithms have better performance, and can deal with large alphabets and large string sets well.