Elements of information theory
Elements of information theory
Signals & systems (2nd ed.)
Distance-based indexing for high-dimensional metric spaces
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Data structures and algorithms for nearest neighbor search in general metric spaces
SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
Locally adaptive dimensionality reduction for indexing large time series databases
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Efficient Similarity Search In Sequence Databases
FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
Variable Length Queries for Time Series Data
Proceedings of the 17th International Conference on Data Engineering
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Fast Time Sequence Indexing for Arbitrary Lp Norms
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Dynamic vp-tree indexing for n-nearest neighbor search given pair-wise distances
The VLDB Journal — The International Journal on Very Large Data Bases
Bursty and hierarchical structure in streams
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration
Data Mining and Knowledge Discovery
Identifying similarities, periodicities and bursts for online search queries
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Indexing spatio-temporal trajectories with Chebyshev polynomials
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Semantic similarity between search engine queries using temporal correlation
WWW '05 Proceedings of the 14th international conference on World Wide Web
Proceedings of the 15th international conference on World Wide Web
Time-dependent semantic similarity measure of queries using historical click-through data
Proceedings of the 15th international conference on World Wide Web
Automatic computation of semantic proximity using taxonomic knowledge
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Measuring the meaning in time series clustering of text search queries
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Temporal analysis of a very large topically categorized Web query log
Journal of the American Society for Information Science and Technology
Similarity of Temporal Query Logs Based on ARIMA Model
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Why we search: visualizing and predicting user behavior
Proceedings of the 16th international conference on World Wide Web
Measuring semantic similarity between words using web search engines
Proceedings of the 16th international conference on World Wide Web
Causal relation of queries from temporal logs
Proceedings of the 16th international conference on World Wide Web
Towards extracting flickr tag semantics
Proceedings of the 16th international conference on World Wide Web
Mining correlated bursty topic patterns from coordinated text streams
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Exact indexing of dynamic time warping
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
The TS-tree: efficient time series search and retrieval
EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Learning about the world through long-term query logs
ACM Transactions on the Web (TWEB)
Gazpacho and summer rash: lexical relationships from temporal patterns of web search queries
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Examining repetition in user search behavior
ECIR'07 Proceedings of the 29th European conference on IR research
A web search method based on the temporal relation of query keywords
WISE'06 Proceedings of the 7th international conference on Web Information Systems
Clustering of search engine keywords using access logs
DEXA'06 Proceedings of the 17th international conference on Database and Expert Systems Applications
Using query profiles for clarification
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
tsdb: a compressed database for time series
TMA'12 Proceedings of the 4th international conference on Traffic Monitoring and Analysis
Hi-index | 0.00 |
Consider a database of time-series, where each datapoint in the series records the total number of users who asked for a specific query at an internet search engine. Storage and analysis of such logs can be very beneficial for a search company from multiple perspectives. First, from a data organization perspective, because query Weblogs capture important trends and statistics, they can help enhance and optimize the search experience (keyword recommendation, discovery of news events). Second, Weblog data can provide an important polling mechanism for the microeconomic aspects of a search engine, since they can facilitate and promote the advertising facet of the search engine (understand what users request and when they request it). Due to the sheer amount of time-series Weblogs, manipulation of the logs in a compressed form is an impeding necessity for fast data processing and compact storage requirements. Here, we explicate how to compute the lower and upper distance bounds on the time-series logs when working directly on their compressed form. Optimal distance estimation means tighter bounds, leading to better candidate selection/elimination and ultimately faster search performance. Our derivation of the optimal distance bounds is based on the careful analysis of the problem using optimization principles. The experimental evaluation suggests a clear performance advantage of the proposed method, compared to previous compression/search techniques. The presented method results in a 10--30% improvement on distance estimations, which in turn leads to 25--80% improvement on the search performance.