Locality-preserving hashing in multidimensional spaces
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
On approximating arbitrary metrices by tree metrics
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Min-wise independent permutations (extended abstract)
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
Tracking join and self-join sizes in limited storage
PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
The space complexity of approximating the frequency moments
Journal of Computer and System Sciences
A small approximately min-wise independent family of hash functions
Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Synopsis data structures for massive data sets
Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Selectively estimation for Boolean queries
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A constant factor approximation algorithm for a class of classification problems
STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Improved classification via connectivity information
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Approximation algorithms for the 0-extension problem
SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms
Approximation algorithms for the metric labeling problem via a new linear programming formulation
SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms
Efficient and tumble similar set retrieval
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Fast, small-space algorithms for approximate histogram maintenance
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Derandomized dimensionality reduction with applications
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Counting Twig Matches in a Tree
Proceedings of the 17th International Conference on Data Engineering
Similarity Search in High Dimensions via Hashing
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries
Proceedings of the 27th International Conference on Very Large Data Bases
On Approximate Nearest Neighbors in Non-Euclidean Spaces
FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Stable distributions, pseudorandom generators, embeddings and data stream computation
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
On the Resemblance and Containment of Documents
SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
Finding Interesting Associations without Support Pruning
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
The Earth Mover's Distance under Transformation Sets
ICCV '99 Proceedings of the International Conference on Computer Vision-Volume 2 - Volume 2
Corner Detection in Textured Color Images
ICCV '99 Proceedings of the International Conference on Computer Vision-Volume 2 - Volume 2
Probabilistic approximation of metric spaces and its algorithmic applications
FOCS '96 Proceedings of the 37th Annual Symposium on Foundations of Computer Science
The Earth Mover''s Distance as a Metric for Image Retrieval
The Earth Mover''s Distance as a Metric for Image Retrieval
The Earth Mover''s Distance: Lower Bounds and Invariance under Translation
The Earth Mover''s Distance: Lower Bounds and Invariance under Translation
Perceptual metrics for image database navigation
Perceptual metrics for image database navigation
A Metric for Distributions with Applications to Image Databases
ICCV '98 Proceedings of the Sixth International Conference on Computer Vision
An improved approximation algorithm for the 0-extension problem
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Efficient similarity search and classification via rank aggregation
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Algorithms for dynamic geometric problems over data streams
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Proceedings of the 2004 workshop on Multimedia and security
Image similarity search with compact data structures
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Nonembeddability theorems via Fourier analysis
FOCS '05 Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science
Finding near-duplicate web pages: a large-scale evaluation of algorithms
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Very sparse random projections
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Exploiting asymmetry in hierarchical topic extraction
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Ferret: a toolkit for content-based similarity search of feature-rich data
Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Randomized algorithms and NLP: using locality sensitive hash function for high speed noun clustering
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Scaling distributional similarity to large corpora
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Using sketches to estimate associations
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Efficient semantic search on DHT overlays
Journal of Parallel and Distributed Computing
Scaling up all pairs similarity search
Proceedings of the 16th international conference on World Wide Web
Detecting near-duplicates for web crawling
Proceedings of the 16th international conference on World Wide Web
Detectives: detecting coalition hit inflation attacks in advertising networks streams
Proceedings of the 16th international conference on World Wide Web
Google news personalization: scalable online collaborative filtering
Proceedings of the 16th international conference on World Wide Web
Sizing sketches: a rank-based analysis for similarity search
Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A topology-aware hierarchical structured overlay network based on locality sensitive hashing scheme
Proceedings of the second workshop on Use of P2P, GRID and agents for the development of content networks
Multiple-signal duplicate detection for search evaluation
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Principles of hash-based text retrieval
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Very sparse stable random projections for dimension reduction in lα (0
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
VISTO: visual storyboard for web video browsing
Proceedings of the 6th ACM international conference on Image and video retrieval
A near linear time constant factor approximation for Euclidean bichromatic matching (cost)
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Combinatorial algorithms for web search engines: three success stories
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
A Sketch Algorithm for Estimating Two-Way and Multi-Way Associations
Computational Linguistics
Tracking Web spam with HTML style similarities
ACM Transactions on the Web (TWEB)
Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Estimators and tail bounds for dimension reduction in lα (0
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Earth mover distance over high-dimensional spaces
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Tracking and modelling information diffusion across interactive online media
International Journal of Metadata, Semantics and Ontologies
Efficient similarity joins for near duplicate detection
Proceedings of the 17th international conference on World Wide Web
Microscale evolution of web pages
Proceedings of the 17th international conference on World Wide Web
Web graph similarity for anomaly detection (poster)
Proceedings of the 17th international conference on World Wide Web
Sketching in adversarial environments
STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
The power of two min-hashes for similarity search among hierarchical data objects
Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
Asymmetric distance estimation with sketches for similarity search in high-dimensional spaces
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
SpotSigs: robust and efficient near duplicate detection in large web collections
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
De-duping URLs via rewrite rules
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Locality sensitive hash functions based on concomitant rank order statistics
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Client-Friendly Classification over Random Hyperplane Hashes
ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Ed-Join: an efficient algorithm for similarity joins with edit distance constraints
Proceedings of the VLDB Endowment
Achieving both high precision and high recall in near-duplicate detection
Proceedings of the 17th ACM conference on Information and knowledge management
Efficiently matching sets of features with random histograms
MM '08 Proceedings of the 16th ACM international conference on Multimedia
Unsupervised Classifier Selection Based on Two-Sample Test
DS '08 Proceedings of the 11th International Conference on Discovery Science
Combinatorial algorithms for nearest neighbors, near-duplicates and small-world design
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Overcoming the l1 non-embeddability barrier: algorithms for product metrics
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Proceedings of the Second ACM International Conference on Web Search and Data Mining
IRLbot: Scaling to 6 billion pages and beyond
ACM Transactions on the Web (TWEB)
Leveraging discarded samples for tighter estimation of multiple-set aggregates
Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Mining broad latent query aspects from search sessions
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Applying syntactic similarity algorithms for enterprise information management
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic video tagging using content redundancy
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Api hyperlinking via structural overlap
Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Acquiring paraphrases from text corpora
Proceedings of the fifth international conference on Knowledge capture
A Linear Classification Method in a Very High Dimensional Space Using Distributed Representation
MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
The pyramid match: efficient learning with partial correspondences
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Near-duplicate detection for web-forums
IDEAS '09 Proceedings of the 2009 International Database Engineering & Applications Symposium
Query expansion for hash-based image object retrieval
MM '09 Proceedings of the 17th ACM international conference on Multimedia
Fast Matching for All Pairs Similarity Search
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Combinatorial Framework for Similarity Search
SISAP '09 Proceedings of the 2009 Second International Workshop on Similarity Search and Applications
Automatic retrieval of similar content using search engine query interface
Proceedings of the 18th ACM conference on Information and knowledge management
Coordinated weighted sampling for estimating aggregates over multiple weight assignments
Proceedings of the VLDB Endowment
Exploiting Sentence-Level Features for Near-Duplicate Document Detection
AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
Capture of lifecycle information in office applications
International Journal of Technology Enhanced Learning
STIMO: STIll and MOving video storyboard for the web scenario
Multimedia Tools and Applications
Learning URL patterns for webpage de-duplication
Proceedings of the third ACM international conference on Web search and data mining
Efficiently searching for similar images
Communications of the ACM
Detecting visually similar Web pages: Application to phishing detection
ACM Transactions on Internet Technology (TOIT)
A pattern tree-based approach to learning URL normalization rules
Proceedings of the 19th international conference on World wide web
Proceedings of the 19th international conference on World wide web
NBiS'07 Proceedings of the 1st international conference on Network-based information systems
Organizing news archives by near-duplicate copy detection in digital libraries
ICADL'07 Proceedings of the 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers
Detecting near-duplicates in large-scale short text databases
PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Transactions on data hiding and multimedia security III
Efficient and accurate nearest neighbor and closest pair search in high-dimensional space
ACM Transactions on Database Systems (TODS)
TACO: tunable approximate computation of outliers in wireless sensor networks
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Similarity search and locality sensitive hashing using ternary content addressable memories
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Proceedings of the 21st ACM conference on Hypertext and hypermedia
Symbolic regression using nearest neighbor indexing
Proceedings of the 12th annual conference companion on Genetic and evolutionary computation
Adaptive near-duplicate detection via similarity learning
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Efficient partial-duplicate detection based on sequence matching
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Evaluating text reuse discovery on the web
Proceedings of the third symposium on Information interaction in context
Random hyperplane projection using derived dimensions
Proceedings of the Ninth ACM International Workshop on Data Engineering for Wireless and Mobile Access
Streaming first story detection with application to Twitter
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Online generation of locality sensitive hash signatures
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
From frequency to meaning: vector space models of semantics
Journal of Artificial Intelligence Research
On locality-sensitive indexing in generic metric spaces
Proceedings of the Third International Conference on SImilarity Search and APplications
Generalizing prefix filtering to improve set similarity joins
Information Systems
Hashing-based approaches to spelling correction of personal names
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Simple and efficient algorithm for approximate dictionary matching
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Feature map hashing: sub-linear indexing of appearance and global geometry
Proceedings of the international conference on Multimedia
XML structural similarity search using mapreduce
WAIM'10 Proceedings of the 11th international conference on Web-age information management
Automatic detection of local reuse
EC-TEL'10 Proceedings of the 5th European conference on Technology enhanced learning conference on Sustaining TEL: from innovation to learning and practice
Randomized locality sensitive vocabularies for bag-of-features model
ECCV'10 Proceedings of the 11th European conference on computer vision conference on Computer vision: Part III
Algorithm for detecting significant locations from raw GPS data
DS'10 Proceedings of the 13th international conference on Discovery science
Detecting duplicate web documents using clickthrough data
Proceedings of the fourth ACM international conference on Web search and data mining
Learning similarity function for rare queries
Proceedings of the fourth ACM international conference on Web search and data mining
Fixing the threshold for effective detection of near duplicate web documents in web crawling
ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications: Part I
Collecting, annotating, and classifying public web services
OTM'10 Proceedings of the 2010 international conference on On the move to meaningful internet systems - Volume Part I
Detecting near-duplicate relations in user generated forum content
OTM'10 Proceedings of the 2010 international conference on On the move to meaningful internet systems
Generating phrasal and sentential paraphrases: A survey of data-driven methods
Computational Linguistics
Identifying enrichment candidates in textbooks
Proceedings of the 20th international conference companion on World wide web
Efficient k-nearest neighbor graph construction for generic similarity measures
Proceedings of the 20th international conference on World wide web
PRESIDIO: A Framework for Efficient Archival Data Storage
ACM Transactions on Storage (TOS)
Theory and applications of b-bit minwise hashing
Communications of the ACM
Similarity join size estimation using locality sensitive hashing
Proceedings of the VLDB Endowment
Get the most out of your sample: optimal unbiased estimators using partial information
Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
ATLAS: a probabilistic algorithm for high dimensional similarity search
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Efficient exact edit similarity query processing with the asymmetric signature scheme
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Learning reconfigurable hashing for diverse semantics
Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Content redundancy in YouTube and its application to video tagging
ACM Transactions on Information Systems (TOIS)
K-median clustering, model-based compressive sensing, and sparse recovery for earth mover distance
Proceedings of the forty-third annual ACM symposium on Theory of computing
Efficient similarity joins for near-duplicate detection
ACM Transactions on Database Systems (TODS)
Semi-supervised SimHash for efficient document similarity search
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Efficient online locality sensitive hashing via reservoir counting
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Coding of Image Feature Descriptors for Distributed Rate-efficient Visual Correspondences
International Journal of Computer Vision
Query by document via a decomposition-based two-level retrieval approach
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
No free lunch: brute force vs. locality-sensitive hashing for cross-lingual pairwise similarity
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
SizeSpotSigs: an effective deduplicate algorithm considering the size of page content
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Two-locus association mapping in subquadratic time
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Fast locality-sensitive hashing
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient approximate similarity search using random projection learning
WAIM'11 Proceedings of the 12th international conference on Web-age information management
COCA filters: co-occurrence aware bloom filters
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Partial duplicate detection for large book collections
Proceedings of the 20th ACM international conference on Information and knowledge management
Probabilistic near-duplicate detection using simhash
Proceedings of the 20th ACM international conference on Information and knowledge management
CoDet: sentence-based containment detection in news corpora
Proceedings of the 20th ACM international conference on Information and knowledge management
Efficient lp-norm multiple feature metric learning for image categorization
Proceedings of the 20th ACM international conference on Information and knowledge management
A simhash-based scheme for locating product information from the web
Proceedings of the Second Symposium on Information and Communication Technology
Mobile product search with bag of hash bits
MM '11 Proceedings of the 19th ACM international conference on Multimedia
Measuring redundancy level on the web
AINTEC '11 Proceedings of the 7th Asian Internet Engineering Conference
LSH-preserving functions and their applications
Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
Oracle attacks and covert channels
IWDW'05 Proceedings of the 4th international conference on Digital Watermarking
Improving random projections using marginal information
COLT'06 Proceedings of the 19th annual conference on Learning Theory
Cluster generation and cluster labelling for web snippets: a fast and accurate hierarchical solution
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Distributed similarity estimation using derived dimensions
The VLDB Journal — The International Journal on Very Large Data Bases
Temporal shingling for version identification in web archives
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
A minimally supervised approach for detecting and ranking document translation pairs
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Bayesian locality sensitive hashing for fast similarity search
Proceedings of the VLDB Endowment
Reranking bilingually extracted paraphrases using monolingual distributional similarity
GEMS '11 Proceedings of the GEMS 2011 Workshop on GEometrical Models of Natural Language Semantics
Efficient semantic-aware detection of near duplicate resources
ESWC'10 Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part II
A fusion of algorithms in near duplicate document detection
PAKDD'11 Proceedings of the 15th international conference on New Frontiers in Applied Data Mining
Clustering and load balancing optimization for redundant content removal
Proceedings of the 21st international conference companion on World Wide Web
AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Data mining for improving textbooks
ACM SIGKDD Explorations Newsletter
V-SMART-join: a scalable mapreduce framework for all-pair similarity joins of multisets and vectors
Proceedings of the VLDB Endowment
Fast sampling word correlations of high dimensional text data (abstract only)
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
CRSI: a compact randomized similarity index for set-valued features
Proceedings of the 15th International Conference on Extending Database Technology
SIMP: accurate and efficient near neighbor search in high dimensional spaces
Proceedings of the 15th International Conference on Extending Database Technology
On generating large-scale ground truth datasets for the deduplication of bibliographic records
Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics
An adaptive approach to dealing with unstable behaviour of users in collaborative filtering systems
Journal of Information Science
Learning hash functions for cross-view similarity search
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Distributed approximate spectral clustering for large-scale datasets
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Multi-resolution similarity hashing
Digital Investigation: The International Journal of Digital Forensics & Incident Response
Compact hashing for mixed image-keyword query over multi-label images
Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Distributed KNN-graph approximation via hashing
Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Maximum inner-product search using cone trees
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
A probabilistic model for multimodal hash function learning
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Boosting multi-kernel locality-sensitive hashing for scalable image retrieval
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Detecting quilted web pages at scale
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Who tags what?: an analysis framework
Proceedings of the VLDB Endowment
ICALP'12 Proceedings of the 39th international colloquium conference on Automata, Languages, and Programming - Volume Part I
Monolingual distributional similarity for text-to-text generation
SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
ATA-sem: chunk-based determination of semantic text similarity
SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Thick boundaries in binary space and their influence on nearest-neighbor search
Pattern Recognition Letters
Streaming analysis of discourse participants
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Fast large-scale approximate graph construction for NLP
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Dynamic covering for recommendation systems
Proceedings of the 21st ACM international conference on Information and knowledge management
Content-based crowd retrieval on the real-time web
Proceedings of the 21st ACM international conference on Information and knowledge management
Efficient retrieval of recommendations in a matrix factorization framework
Proceedings of the 21st ACM international conference on Information and knowledge management
Learning to rank duplicate bug reports
Proceedings of the 21st ACM international conference on Information and knowledge management
De-duplication of aggregation authority files
International Journal of Metadata, Semantics and Ontologies
Semi-supervised spectral hashing for fast similarity search
Neurocomputing
Efficient jaccard-based diversity analysis of large document collections
Proceedings of the 21st ACM international conference on Information and knowledge management
Efficient distributed locality sensitive hashing
Proceedings of the 21st ACM international conference on Information and knowledge management
Accelerated large scale optimization by concomitant hashing
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part I
Efficient mining of repetitions in large-scale TV streams with product quantization hashing
ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part I
Cross-Language high similarity search using a conceptual thesaurus
CLEF'12 Proceedings of the Third international conference on Information Access Evaluation: multilinguality, multimodality, and visual analytics
Learning compact visual attributes for large-scale image classification
ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part III
Fast near neighbor search in high-dimensional binary data
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
TopicExplorer: exploring document collections with topic models
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Pairwise similarity of TopSig document signatures
Proceedings of the Seventeenth Australasian Document Computing Symposium
Instance-Based matching of large ontologies using locality-sensitive hashing
ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part I
A novel approach for leveraging co-occurrence to improve the false positive error in signature files
Journal of Discrete Algorithms
Robust plagiary detection using semantic compression augmented SHAPD
ICCCI'12 Proceedings of the 4th international conference on Computational Collective Intelligence: technologies and applications - Volume Part I
Sparse hashing for fast multimedia search
ACM Transactions on Information Systems (TOIS)
Signature matching distance for content-based image retrieval
Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
International Journal of Applied Cryptography
Reading the correct history?: modeling temporal intention in resource sharing
Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Reducing information redundancy in search results
Proceedings of the 28th Annual ACM Symposium on Applied Computing
Neighbourhood preserving quantisation for LSH
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Diversity maximization under matroid constraints
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Distributed large-scale natural graph factorization
Proceedings of the 22nd international conference on World Wide Web
Efficient community detection in large networks using content and links
Proceedings of the 22nd international conference on World Wide Web
Groundhog day: near-duplicate detection on Twitter
Proceedings of the 22nd international conference on World Wide Web
Near duplicate detection in an academic digital library
Proceedings of the 2013 ACM symposium on Document engineering
Efficient Nearest-Neighbor Search in the Probability Simplex
Proceedings of the 2013 Conference on the Theory of Information Retrieval
The fingerprint analysis technique-oriented research on microblog for public opinion analysis
Proceedings of the Fifth International Conference on Internet Multimedia Computing and Service
Event detection and trending in multiple social networking sites
Proceedings of the 16th Communications & Networking Symposium
Improved binary feature matching through fusion of hamming distance and fragile bit weight
Proceedings of the 3rd ACM international workshop on Interactive multimedia on mobile & portable devices
Efficient filtering and ranking schemes for finding inclusion dependencies on the web
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Locality sensitive hashing revisited: filling the gap between theory and algorithm analysis
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Asymmetric signature schemes for efficient exact edit similarity query processing
ACM Transactions on Database Systems (TODS)
De-duplication of aggregation authority files
International Journal of Metadata, Semantics and Ontologies
Malware analysis method using visualization of binary files
Proceedings of the 2013 Research in Adaptive and Convergent Systems
Ranking-based name matching for author disambiguation in bibliographic data
Proceedings of the 2013 KDD Cup 2013 Workshop
On the use of decentralization to enable privacy in web-scale recommendation services
Proceedings of the 12th ACM workshop on Workshop on privacy in the electronic society
Moving heaven and earth: distances between distributions
ACM SIGACT News
In-network approximate computation of outliers with quality guarantees
Information Systems
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Large-scale multilabel propagation based on efficient sparse graph construction
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Multiple feature kernel hashing for large-scale visual search
Pattern Recognition
Streaming similarity search over one billion tweets using parallel locality-sensitive hashing
Proceedings of the VLDB Endowment
Dimension independent similarity computation
The Journal of Machine Learning Research
Hi-index | 0.03 |
(MATH) A locality sensitive hashing scheme is a distribution on a family $\F$ of hash functions operating on a collection of objects, such that for two objects x,y, Prh&egr;F[h(x) = h(y)] = sim(x,y), where sim(x,y) &egr; [0,1] is some similarity function defined on the collection of objects. Such a scheme leads to a compact representation of objects so that similarity of objects can be estimated from their compact sketches, and also leads to efficient algorithms for approximate nearest neighbor search and clustering. Min-wise independent permutations provide an elegant construction of such a locality sensitive hashing scheme for a collection of subsets with the set similarity measure sim(A,B) = \frac{|A &Pgr; B|}{|A &Pgr B|}.(MATH) We show that rounding algorithms for LPs and SDPs used in the context of approximation algorithms can be viewed as locality sensitive hashing schemes for several interesting collections of objects. Based on this insight, we construct new locality sensitive hashing schemes for:A collection of vectors with the distance between → \over u and → \over v measured by Ø(→ \over u, → \over v)/&pgr;, where Ø(→ \over u, → \over v) is the angle between → \over u) and → \over v). This yields a sketching scheme for estimating the cosine similarity measure between two vectors, as well as a simple alternative to minwise independent permutations for estimating set similarity.A collection of distributions on n points in a metric space, with distance between distributions measured by the Earth Mover Distance (EMD), (a popular distance measure in graphics and vision). Our hash functions map distributions to points in the metric space such that, for distributions P and Q, EMD(P,Q) &xie; Eh&egr;\F [d(h(P),h(Q))] &xie; O(log n log log n). EMD(P, Q)..