Data compression: methods and theory
Data compression: methods and theory
Information retrieval
Fundamentals of speech recognition
Fundamentals of speech recognition
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Differential files: their application to the maintenance of large databases
ACM Transactions on Database Systems (TODS)
Extended Boolean information retrieval
Communications of the ACM
Searching Multimedia Databases by Content
Searching Multimedia Databases by Content
Clustering Algorithms
Efficient Similarity Search In Sequence Databases
FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
A New Compression Method with Fast Searching on Large Databases
VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
Efficient and Effective Clustering Methods for Spatial Data Mining
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
SSD '95 Proceedings of the 4th International Symposium on Advances in Spatial Databases
Hierarchically Split Cube Forests for Decision Support: description and tuned design
Hierarchically Split Cube Forests for Decision Support: description and tuned design
Clustering and singular value decomposition for approximate indexing in high dimensional spaces
Proceedings of the seventh international conference on Information and knowledge management
Iterated DFT based techniques for join size estimation
Proceedings of the seventh international conference on Information and knowledge management
Multi-fidelity algorithms for interactive mobile applications
DIALM '99 Proceedings of the 3rd international workshop on Discrete algorithms and methods for mobile computing and communications
An adaptive view element framework for multi-dimensional data management
Proceedings of the eighth international conference on Information and knowledge management
Time series similarity measures (tutorial PM-2)
Tutorial notes of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Locally adaptive dimensionality reduction for indexing large time series databases
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Ensemble-index: a new approach to indexing large databases
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient and robust feature extraction and pattern matching of time series by a lattice structure
Proceedings of the tenth international conference on Information and knowledge management
Multi-fidelity algorithms for interactive mobile applications
Wireless Networks
Locally adaptive dimensionality reduction for indexing large time series databases
ACM Transactions on Database Systems (TODS)
F4: large-scale automated forecasting using fractals
Proceedings of the eleventh international conference on Information and knowledge management
A Survey of Temporal Knowledge Discovery Paradigms and Methods
IEEE Transactions on Knowledge and Data Engineering
Multiscale Comparison of Temporal Patternsin Time-Series Medical Databases
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Semantic Compression and Pattern Extraction with Fascicles
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Approximate Query Processing: Taming the TeraBytes
Proceedings of the 27th International Conference on Very Large Data Bases
Eureka!: A Tool for Interactive Knowledge Discovery
DEXA '02 Proceedings of the 13th International Conference on Database and Expert Systems Applications
Efficient Similarity Search for Time Series Data Based on the Minimum Distance
CAiSE '02 Proceedings of the 14th International Conference on Advanced Information Systems Engineering
On the need for time series data mining benchmarks: a survey and empirical demonstration
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Asynchronous Periodic Patterns in Time Series Data
IEEE Transactions on Knowledge and Data Engineering
IEEE Transactions on Knowledge and Data Engineering
Haar Wavelets for Efficient Similarity Search of Time-Series: With and Without Time Warping
IEEE Transactions on Knowledge and Data Engineering
SSDBM '00 Proceedings of the 12th International Conference on Scientific and Statistical Database Management
On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration
Data Mining and Knowledge Discovery
Warping indexes with envelope transforms for query by humming
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Minimum distance queries for time series data
Journal of Systems and Software
ItCompress: An Iterative Semantic Compression Algorithm
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Online event-driven subsequence matching over financial data streams
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Indexing spatio-temporal trajectories with Chebyshev polynomials
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Co-training with a Single Natural Feature Set Applied to Email Classification
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Distance-function design and fusion for sequence data
Proceedings of the thirteenth ACM international conference on Information and knowledge management
A PCA-based similarity measure for multivariate time series
Proceedings of the 2nd ACM international workshop on Multimedia databases
Indexing of variable length multi-attribute motion data
Proceedings of the 2nd ACM international workshop on Multimedia databases
Bounded similarity querying for time-series data
Information and Computation - Special issue: Commemorating the 50th birthday anniversary of Paris C. Kanellakis
Exact indexing of dynamic time warping
Knowledge and Information Systems
Robust and fast similarity search for moving object trajectories
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Subsequence matching on structured time series data
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Fast window correlations over uncooperative time series
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Finding similarity in time series data by method of time weighted moments
ADC '05 Proceedings of the 16th Australasian database conference - Volume 39
Hypercube sweeping algorithm for subsequence motion matching in large motion databases
Proceedings of the 2006 ACM international conference on Virtual reality continuum and its applications
A Bit Level Representation for Time Series Data Mining with Shape Based Similarity
Data Mining and Knowledge Discovery
A geometrical solution to time series searching invariant to shifting and scaling
Knowledge and Information Systems
Access Structures for Angular Similarity Queries
IEEE Transactions on Knowledge and Data Engineering
A fast and effective method to find correlations among attributes in databases
Data Mining and Knowledge Discovery
An efficient and accurate method for evaluating time series similarity
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
General Hierarchical Model (GHM) to measure similarity of time series
ACM SIGMOD Record
StatStream: statistical monitoring of thousands of data streams in real time
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Exact indexing of dynamic time warping
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
On the marriage of Lp-norms and edit distance
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Boolean representation based data-adaptive correlation analysis over time series streams
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Indexable PLA for efficient similarity search
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Representing financial time series based on data point importance
Engineering Applications of Artificial Intelligence
Persistent clustered main memory index for accelerating k-NN queries on high dimensional datasets
Multimedia Tools and Applications
Multimedia Tools and Applications
An Efficient Similarity Searching Algorithm Based on Clustering for Time Series
ICDM '08 Proceedings of the 8th industrial conference on Advances in Data Mining: Medical Applications, E-Commerce, Marketing, and Theoretical Aspects
Extending the Edit Distance Using Frequencies of Common Characters
DEXA '08 Proceedings of the 19th international conference on Database and Expert Systems Applications
Proceedings of the VLDB Endowment
The multi-resolution extended edit distance
Proceedings of the 3rd international conference on Scalable information systems
Time series classification based on qualitative space fragmentation
Advanced Engineering Informatics
Adaptive correlation analysis in stream time series with sliding windows
Computers & Mathematics with Applications
Information Sciences: an International Journal
Periodic Pattern Analysis in Time Series Databases
DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
AMID: Approximation of MultI-measured Data using SVD
Information Sciences: an International Journal
Efficient anomaly monitoring over moving object trajectory streams
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
GAMPS: compressing multi sensor data by grouping and amplitude scaling
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Constraint-Based Learning of Distance Functions for Object Trajectories
SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
A time series representation model for accurate and fast similarity detection
Pattern Recognition
Bounded similarity querying for time-series data
Information and Computation
Similarity search using the polar wavelet in time series databases
ICIC'07 Proceedings of the intelligent computing 3rd international conference on Advanced intelligent computing theories and applications
An MBR-safe transform for high-dimensional MBRs in similar sequence matching
DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Interval-focused similarity search in time series databases
DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Approximate clustering of time series using compact model-based descriptions
DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
Multi-resolution approach to time series retrieval
Proceedings of the Fourteenth International Database Engineering & Applications Symposium
Efficient algorithm for a novel pattern of time series
Expert Systems with Applications: An International Journal
Discrete wavelet transform-based time series analysis and mining
ACM Computing Surveys (CSUR)
Enhancing the symbolic aggregate approximation method using updated lookup tables
KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part I
A review on time series data mining
Engineering Applications of Artificial Intelligence
TIDES--a new descriptor for time series oscillation behavior
Geoinformatica
Fast retrieval of time series using a multi-resolution filter with multiple reduced spaces
ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications: Part I
Piecewise cloud approximation for time series mining
Knowledge-Based Systems
Blind feature extraction for time-series classification using haar wavelet transform
ISNN'05 Proceedings of the Second international conference on Advances in neural networks - Volume Part II
Scalable kNN search on vertically stored time series
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Spectral analysis of a blogosphere
Proceedings of the 20th ACM international conference on Information and knowledge management
DAPSS: exact subsequence matching for data streams
DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
A novel indexing approach for efficient and fast similarity search of captured motions
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
A multi-hierarchical representation for similarity measurement of time series
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Multivariate stream data reduction in sensor network applications
EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing
TWStream: finding correlated data streams under time warping
APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
Similarity search on time series based on threshold queries
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Efficient pattern matching of multidimensional sequences
RSFDGrC'05 Proceedings of the 10th international conference on Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing - Volume Part II
Visually exploring movement data via similarity-based analysis
Journal of Intelligent Information Systems
SFA: a symbolic fourier approximation and index for similarity search in high dimensional datasets
Proceedings of the 15th International Conference on Extending Database Technology
Model-Based similarity measure in timecloud
APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Expert Systems with Applications: An International Journal
Proceedings of the 16th International Database Engineering & Applications Sysmposium
ACM Computing Surveys (CSUR)
Genetic algorithms-based symbolic aggregate approximation
DaWaK'12 Proceedings of the 14th international conference on Data Warehousing and Knowledge Discovery
Experimental comparison of representation methods and distance measures for time series data
Data Mining and Knowledge Discovery
A representation of time series based on implicit polynomial curve
Pattern Recognition Letters
Searching time series with Hadoop in an electric power company
Proceedings of the 2nd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
TSum: fast, principled table summarization
Proceedings of the Seventh International Workshop on Data Mining for Online Advertising
A metric learning based approach to evaluate task-specific time series similarity
WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Automatic field data analyzer for closed-loop vehicle design
Information Sciences: an International Journal
Hi-index | 0.00 |
Ad hoc querying is difficult on very large datasets, since it is usually not possible to have the entire dataset on disk. While compression can be used to decrease the size of the dataset, compressed data is notoriously difficult to index or access.In this paper we consider a very large dataset comprising multiple distinct time sequences. Each point in the sequence is a numerical value. We show how to compress such a dataset into a format that supports ad hoc querying, provided that a small error can be tolerated when the data is uncompressed. Experiments on large, real world datasets (AT&T customer calling patterns) show that the proposed method achieves an average of less than 5% error in any data value after compressing to a mere 2.5% of the original space (i.e., a 40:1 compression ratio), with these numbers not very sensitive to dataset size. Experiments on aggregate queries achieved a 0.5% reconstruction error with a space requirement under 2%.