Min-wise independent permutations (extended abstract)
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
A small approximately min-wise independent family of hash functions
Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Selectively estimation for Boolean queries
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Finding replicated Web collections
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
On permutations with limited independence
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Min-Wise versus linear independence (extended abstract)
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
A protocol-independent technique for eliminating redundant network traffic
Proceedings of the conference on Applications, Technologies, Architectures, and Protocols for Computer Communication
A low-bandwidth network file system
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Similarity estimation techniques from rounding algorithms
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Evaluating strategies for similarity search on the web
Proceedings of the 11th international conference on World Wide Web
Mining database structure; or, how to build a data quality browser
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Finding Interesting Associations without Support Pruning
IEEE Transactions on Knowledge and Data Engineering
Informed content delivery across adaptive overlay networks
Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
Using histograms to estimate answer sizes for XML queries
Information Systems - Special issue: Best papers from EDBT 2002
Estimating Answer Sizes for XML Queries
EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
ISAAC '01 Proceedings of the 12th International Symposium on Algorithms and Computation
Estimating Resemblance of MIDI Documents
ALENEX '01 Revised Papers from the Third International Workshop on Algorithm Engineering and Experimentation
A Derandomization Using Min-Wise Independent Permutations
RANDOM '98 Proceedings of the Second International Workshop on Randomization and Approximation Techniques in Computer Science
Identifying and Filtering Near-Duplicate Documents
COM '00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching
Efficient algorithms for shared camera control
Proceedings of the nineteenth annual symposium on Computational geometry
Algorithmic aspects of information retrieval on the web
Handbook of massive data sets
Searching large text collections
Handbook of massive data sets
On the sample size of k-restricted min-wise independent permutations and other k-wise distributions
Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Challenges in web search engines
ACM SIGIR Forum
Pastiche: making backup cheap and easy
ACM SIGOPS Operating Systems Review - OSDI '02: Proceedings of the 5th symposium on Operating systems design and implementation
Generalized substring selectivity estimation
Journal of Computer and System Sciences - Special issue on PODS 2000
Winnowing: local algorithms for document fingerprinting
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Robust and efficient fuzzy match for online data cleaning
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
TreeJuxtaposer: scalable tree comparison using Focus+Context with guaranteed visibility
ACM SIGGRAPH 2003 Papers
Bullet: high bandwidth data dissemination using an overlay mesh
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
A derandomization using min-wise independent permutations
Journal of Discrete Algorithms
Techniques for efficient fragment detection in web pages
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Automatic detection of fragments in dynamically generated web pages
Proceedings of the 13th international conference on World Wide Web
Improved robustness of signature-based near-replica detection via lexicon randomization
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Informed content delivery across adaptive overlay networks
IEEE/ACM Transactions on Networking (TON)
Pastiche: making backup cheap and easy
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Hierarchical substring caching for efficient content distribution to low-bandwidth clients
WWW '05 Proceedings of the 14th international conference on World Wide Web
Three-level caching for efficient query processing in large Web search engines
WWW '05 Proceedings of the 14th international conference on World Wide Web
Scaling link-based similarity search
WWW '05 Proceedings of the 14th international conference on World Wide Web
LSH forest: self-tuning indexes for similarity search
WWW '05 Proceedings of the 14th international conference on World Wide Web
Server-friendly delta compression for efficient web access
Web content caching and distribution
Sampling algorithms in a stream operator
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
K-gram based software birthmarks
Proceedings of the 2005 ACM symposium on Applied computing
Automatic Fragment Detection in Dynamic Web Pages and Its Impact on Caching
IEEE Transactions on Knowledge and Data Engineering
Comparison of texts streams in the presence of mild adversaries
ACSW Frontiers '05 Proceedings of the 2005 Australasian workshop on Grid computing and e-research - Volume 44
Metadata aggregation and "automated digital libraries": a retrospective on the NSDL experience
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Approximate maximum weight branchings
Information Processing Letters
Efficient and decentralized PageRank approximation in a peer-to-peer web search network
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Discovering and exploiting keyword and attribute-value co-occurrences to improve P2P routing indices
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
A dictionary for approximate string search and longest prefix search
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Improving duplicate elimination in storage systems
ACM Transactions on Storage (TOS)
Randomized algorithms and NLP: using locality sensitive hash function for high speed noun clustering
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Scaling distributional similarity to large corpora
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Using sketches to estimate associations
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
p2pDating: Real life inspired semantic overlay networks for Web search
Information Processing and Management: an International Journal
Information redundancy across metadata collections
Information Processing and Management: an International Journal
Accurate discovery of co-derivative documents via duplicate text detection
Information Systems
Efficient plagiarism detection for large code repositories
Software—Practice & Experience
Estimating the selectivity of approximate string queries
ACM Transactions on Database Systems (TODS)
Detecting near-duplicates for web crawling
Proceedings of the 16th international conference on World Wide Web
Detectives: detecting coalition hit inflation attacks in advertising networks streams
Proceedings of the 16th international conference on World Wide Web
Google news personalization: scalable online collaborative filtering
Proceedings of the 16th international conference on World Wide Web
Proceedings of the 16th international conference on World Wide Web
A cost-effective method for detecting web site replicas on search engine databases
Data & Knowledge Engineering
Redundancy elimination within large collections of files
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Alternatives for detecting redundancy in storage systems data
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
TAPER: tiered approach for eliminating redundancy in replica synchronization
FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
Bottom-k sketches: better and more efficient estimation of aggregates
Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Representing aggregate works in the digital library
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Practical Algorithms and Lower Bounds for Similarity Search in Massive Graphs
IEEE Transactions on Knowledge and Data Engineering
Summarizing data using bottom-k sketches
Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Very sparse stable random projections for dimension reduction in lα (0
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Probabilistic fingerprints for shapes
SGP '06 Proceedings of the fourth Eurographics symposium on Geometry processing
Scalable near identical image and shot detection
Proceedings of the 6th ACM international conference on Image and video retrieval
Combinatorial algorithms for web search engines: three success stories
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
On (ε,k)-min-wise independent permutations
Random Structures & Algorithms
A Sketch Algorithm for Estimating Two-Way and Multi-Way Associations
Computational Linguistics
Highly efficient techniques for network forensics
Proceedings of the 14th ACM conference on Computer and communications security
Optimized query execution in large search engines with global page ordering
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Effective ranked conceptual retrieval
Proceedings of the ACM first workshop on CyberInfrastructure: information management in eScience
Automated detection of api refactorings in libraries
Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering
Tracking Web spam with HTML style similarities
ACM Transactions on the Web (TWEB)
Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions
Communications of the ACM - 50th anniversary issue: 1958 - 2008
High-bandwidth data dissemination for large-scale distributed systems
ACM Transactions on Computer Systems (TOCS)
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Estimators and tail bounds for dimension reduction in lα (0
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Spamscatter: characterizing internet scam hosting infrastructure
SS'07 Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium
Efficient similarity joins for near duplicate detection
Proceedings of the 17th international conference on World Wide Web
Genealogical trees on the web: a search engine user perspective
Proceedings of the 17th international conference on World Wide Web
The power of two min-hashes for similarity search among hierarchical data objects
Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Generating links by mining quotations
Proceedings of the nineteenth ACM conference on Hypertext and hypermedia
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Lexicon randomization for near-duplicate detection with I-Match
The Journal of Supercomputing
Efficient semi-streaming algorithms for local triangle counting in massive graphs
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
De-duping URLs via rewrite rules
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
S2S: structural-to-syntactic matching similar documents
Knowledge and Information Systems
Approximate schemas, source-consistency and query answering
Journal of Intelligent Information Systems
Trusting spam reporters: A reporter-based reputation system for email filtering
ACM Transactions on Information Systems (TOIS)
ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
Advanced Network Fingerprinting
RAID '08 Proceedings of the 11th international symposium on Recent Advances in Intrusion Detection
Tighter estimation using bottom k sketches
Proceedings of the VLDB Endowment
Note: Order statistics and estimating cardinalities of massive data sets
Discrete Applied Mathematics
Type-based categorization of relational attributes
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Approximate substring selectivity estimation
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Detecting Theft of Java Applications via a Static Birthmark Based on Weighted Stack Patterns
IEICE - Transactions on Information and Systems
Sparse indexing: large scale, inline deduplication using sampling and locality
FAST '09 Proccedings of the 7th conference on File and storage technologies
Efficient overlap and content reuse detection in blogs and online news articles
Proceedings of the 18th international conference on World wide web
The design of a similarity based deduplication system
SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
Using phrases as features in email classification
Journal of Systems and Software
Online pairing of VoIP conversations
The VLDB Journal — The International Journal on Very Large Data Bases
Leveraging discarded samples for tighter estimation of multiple-set aggregates
Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Applying syntactic similarity algorithms for enterprise information management
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient Clustering of Web-Derived Data Sets
MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
Frequent Itemset Mining for Clustering Near Duplicate Web Documents
ICCS '09 Proceedings of the 17th International Conference on Conceptual Structures: Conceptual Structures: Leveraging Semantic Technologies
Efficient Set Similarity Joins Using Min-prefixes
ADBIS '09 Proceedings of the 13th East European Conference on Advances in Databases and Information Systems
Challenges in web search engines
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Compact full-text indexing of versioned document collections
Proceedings of the 18th ACM conference on Information and knowledge management
Automatic retrieval of similar content using search engine query interface
Proceedings of the 18th ACM conference on Information and knowledge management
URL normalization for de-duplication of web pages
Proceedings of the 18th ACM conference on Information and knowledge management
Coordinated weighted sampling for estimating aggregates over multiple weight assignments
Proceedings of the VLDB Endowment
Power-law based estimation of set similarity join size
Proceedings of the VLDB Endowment
Exploiting Sentence-Level Features for Near-Duplicate Document Detection
AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
"Language Is the Skin of My Thought": Integrating Wikipedia and AI to Support a Guillotine Player
AI*IA '09: Proceedings of the XIth International Conference of the Italian Association for Artificial Intelligence Reggio Emilia on Emergent Perspectives in Artificial Intelligence
Approximate Structural Consistency
SOFSEM '10 Proceedings of the 36th Conference on Current Trends in Theory and Practice of Computer Science
New payload attribution methods for network forensic investigations
ACM Transactions on Information and System Security (TISSEC)
Web-scale distributional similarity and entity set expansion
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Learning URL patterns for webpage de-duplication
Proceedings of the third ACM international conference on Web search and data mining
User Modeling and User-Adapted Interaction
Connection network and optimization of interest metric for one-to-one marketing
GECCO'03 Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartII
Systems support for remote visualization of genomics applications over wide area networks
GCCB'06 Proceedings of the 2006 international conference on Distributed, high-performance and grid computing in computational biology
Proceedings of the 19th international conference on World wide web
How should we solve search problems privately?
CRYPTO'07 Proceedings of the 27th annual international cryptology conference on Advances in cryptology
Differences and identities in document retrieval in an annotation environment
DNIS'07 Proceedings of the 5th international conference on Databases in networked information systems
Understanding content reuse on the web: static and dynamic analyses
WebKDD'06 Proceedings of the 8th Knowledge discovery on the web international conference on Advances in web mining and web usage analysis
Weighted shingling: an adaptation of shingling for weighted shingles
IIT'09 Proceedings of the 6th international conference on Innovations in information technology
Sampling dirty data for matching attributes
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
On indexing error-tolerant set containment
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Proceedings of the 21st ACM conference on Hypertext and hypermedia
Efficient similarity estimation for systems exploiting data redundancy
INFOCOM'10 Proceedings of the 29th conference on Information communications
Efficient partial-duplicate detection based on sequence matching
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Efficient algorithms for large-scale local triangle counting
ACM Transactions on Knowledge Discovery from Data (TKDD)
Efficient privacy-preserving similar document detection
The VLDB Journal — The International Journal on Very Large Data Bases
From frequency to meaning: vector space models of semantics
Journal of Artificial Intelligence Research
Estimating set intersection using small samples
ACSC '10 Proceedings of the Thirty-Third Australasian Conferenc on Computer Science - Volume 102
On locality-sensitive indexing in generic metric spaces
Proceedings of the Third International Conference on SImilarity Search and APplications
Similarity joins as stronger metric operations
SIGSPATIAL Special
Generalizing prefix filtering to improve set similarity joins
Information Systems
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Plagiarism detection across distant language pairs
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Real-time large scale near-duplicate web video retrieval
Proceedings of the international conference on Multimedia
Efficient non-linear editing for non-volatile mobile storage
Proceedings of the 2010 ACM multimedia workshop on Mobile cloud media computing
Code analyzer for an online course management system
Journal of Systems and Software
Rebuilding the world from views
WAIM'10 Proceedings of the 11th international conference on Web-age information management
XML structural similarity search using mapreduce
WAIM'10 Proceedings of the 11th international conference on Web-age information management
Partition min-hash for partial duplicate image discovery
ECCV'10 Proceedings of the 11th European conference on Computer vision: Part I
Binary coherent edge descriptors
ECCV'10 Proceedings of the 11th European conference on Computer vision: Part II
Automatic detection of local reuse
EC-TEL'10 Proceedings of the 5th European conference on Technology enhanced learning conference on Sustaining TEL: from innovation to learning and practice
Set similarity join on probabilistic data
Proceedings of the VLDB Endowment
On multi-column foreign key discovery
Proceedings of the VLDB Endowment
Facilitating interaction and retrieval for annotated documents
International Journal of Computational Science and Engineering
Generalised Sequence Signatures through symbolic clustering
International Journal of Data Mining and Bioinformatics
Detecting near-duplicate relations in user generated forum content
OTM'10 Proceedings of the 2010 international conference on On the move to meaningful internet systems
Generating phrasal and sentential paraphrases: A survey of data-driven methods
Computational Linguistics
Exponential time improvement for min-wise based algorithms
Information and Computation
Approximate Satisfiability and Equivalence
SIAM Journal on Computing
Developing a corpus of plagiarised short answers
Language Resources and Evaluation
PRESIDIO: A Framework for Efficient Archival Data Storage
ACM Transactions on Storage (TOS)
Theory and applications of b-bit minwise hashing
Communications of the ACM
Similarity join size estimation using locality sensitive hashing
Proceedings of the VLDB Endowment
Reuse in the wild: an empirical and ethnographic study of organizational content reuse
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Comparison of similarity metrics for refactoring detection
Proceedings of the 8th Working Conference on Mining Software Repositories
Get the most out of your sample: optimal unbiased estimators using partial information
Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Efficient exact edit similarity query processing with the asymmetric signature scheme
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Consistent visual words mining with adaptive sampling
Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Schema mapping with quality assurance for data integration
APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
Studying software evolution using artefacts' shared information content
Science of Computer Programming
Efficient similarity joins for near-duplicate detection
ACM Transactions on Database Systems (TODS)
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Composite hashing with multiple information sources
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Hypergeometric language models for republished article finding
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
No free lunch: brute force vs. locality-sensitive hashing for cross-lingual pairwise similarity
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Retrieving similar documents from the web
Journal of Web Engineering
What's the difference?: efficient set reconciliation without prior context
Proceedings of the ACM SIGCOMM 2011 conference
Information retrieval techniques for corpus filtering applied to external plagiarism detection
NLDB'11 Proceedings of the 16th international conference on Natural language processing and information systems
Cloak and dagger: dynamics of web search cloaking
Proceedings of the 18th ACM conference on Computer and communications security
Towards a universal sketch for origin-destination network measurements
NPC'11 Proceedings of the 8th IFIP international conference on Network and parallel computing
Plagiarism detection based on structural information
Proceedings of the 20th ACM international conference on Information and knowledge management
CoDet: sentence-based containment detection in news corpora
Proceedings of the 20th ACM international conference on Information and knowledge management
Detection of near-duplicate user generated contents: the SMS spam collection
Proceedings of the 3rd international workshop on Search and mining user-generated contents
DeFFS: Duplication-eliminated flash file system
Computers and Electrical Engineering
A Counterexample to Strong Parallel Repetition
SIAM Journal on Computing
ICDT'07 Proceedings of the 11th international conference on Database Theory
LSH-preserving functions and their applications
Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
Enhancing duplicate collection detection through replica boundary discovery
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
The case of the duplicate documents measurement, search, and science
APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
Caching frequent XML query patterns
APWeb'06 Proceedings of the 2006 international conference on Advanced Web and Network Technologies, and Applications
IQN routing: integrating quality and novelty in P2P querying and ranking
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Exponential time improvement for min-wise based algorithms
Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
OverCite: a cooperative digital research library
IPTPS'05 Proceedings of the 4th international conference on Peer-to-Peer Systems
ISMIS'05 Proceedings of the 15th international conference on Foundations of Intelligent Systems
A scalable randomized method to compute link-based similarity rank on the web graph
EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
Measuring similarity of large software systems based on source code correspondence
PROFES'05 Proceedings of the 6th international conference on Product Focused Software Process Improvement
An improved plagiarism detection scheme based on semantic role labeling
Applied Soft Computing
On approximation algorithms for data mining applications
Efficient Approximation and Online Algorithms
Efficient processing of probabilistic set-containment queries on uncertain set-valued data
Information Sciences: an International Journal
Automated detection of refactorings in evolving components
ECOOP'06 Proceedings of the 20th European conference on Object-Oriented Programming
Word length n-grams for text re-use detection
CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
A shared fragments analysis system for large collections of web pages
DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
WAN optimized replication of backup datasets using stream-informed delta compression
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
V-SMART-join: a scalable mapreduce framework for all-pair similarity joins of multisets and vectors
Proceedings of the VLDB Endowment
Survey: An overview on XML similarity: Background, current trends and future directions
Computer Science Review
On how often code is cloned across repositories
Proceedings of the 34th International Conference on Software Engineering
Delta compressed and deduplicated storage using stream-informed locality
HotStorage'12 Proceedings of the 4th USENIX conference on Hot Topics in Storage and File Systems
Time-based calibration of effectiveness measures
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Detecting quilted web pages at scale
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Learning hash codes for efficient content reuse detection
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Analysis and extraction of sentence-level paraphrase sub-corpus in CS education
Proceedings of the 13th annual conference on Information technology education
Genodroid: are privacy-preserving genomic tests ready for prime time?
Proceedings of the 2012 ACM workshop on Privacy in the electronic society
WAN-optimized replication of backup datasets using stream-informed delta compression
ACM Transactions on Storage (TOS)
UKP: computing semantic textual similarity by combining multiple content similarity measures
SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Improving XML instances comparison with preprocessing algorithms
DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
Efficient jaccard-based diversity analysis of large document collections
Proceedings of the 21st ACM international conference on Information and knowledge management
Stochastic simulation of time-biased gain
Proceedings of the 21st ACM international conference on Information and knowledge management
Approximate verification and enumeration problems
ICTAC'12 Proceedings of the 9th international conference on Theoretical Aspects of Computing
Increasing recall for text re-use in historical documents to support research in the humanities
TPDL'12 Proceedings of the Second international conference on Theory and Practice of Digital Libraries
Fast near neighbor search in high-dimensional binary data
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Experiments with filtered detection of similar academic papers
AIMSA'12 Proceedings of the 15th international conference on Artificial Intelligence: methodology, systems, and applications
Instance-Based matching of large ontologies using locality-sensitive hashing
ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part I
ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part II
An evaluation of two automatic landmark building discovery algorithms for city reconstruction
ECCV'10 Proceedings of the 11th European conference on Trends and Topics in Computer Vision - Volume Part II
ACRONYM: Context Metrics for Linking People to User-Generated Media Content
International Journal on Semantic Web & Information Systems
Graph-based semi-supervised learning with multi-modality propagation for large-scale image datasets
Journal of Visual Communication and Image Representation
Early-Detection system for cross-language (translated) plagiarism
ICT-EurAsia'13 Proceedings of the 2013 international conference on Information and Communication Technology
Efficient top-k algorithms for approximate substring matching
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
International Journal of Applied Cryptography
Bottom-k and priority sampling, set similarity and subset sums with minimal independence
Proceedings of the forty-fifth annual ACM symposium on Theory of computing
A distributed framework for scaling Up LSH-based computations in privacy preserving record linkage
Proceedings of the 6th Balkan Conference in Informatics
Revision graph extraction in Wikipedia based on supergram decomposition
Proceedings of the 9th International Symposium on Open Collaboration
Spatial min-Hash for similar image search
Proceedings of the Fifth International Conference on Internet Multimedia Computing and Service
Event detection and trending in multiple social networking sites
Proceedings of the 16th Communications & Networking Symposium
Sim-min-hash: an efficient matching technique for linking large image collections
Proceedings of the 21st ACM international conference on Multimedia
Orthogonal query recommendation
Proceedings of the 7th ACM conference on Recommender systems
Asymmetric signature schemes for efficient exact edit similarity query processing
ACM Transactions on Database Systems (TODS)
Sketching for big data recommender systems using fast pseudo-random fingerprints
ICALP'13 Proceedings of the 40th international conference on Automata, Languages, and Programming - Volume Part II
b-bit minwise hashing in practice
Proceedings of the 5th Asia-Pacific Symposium on Internetware
MiG: efficient migration of desktop VMs using semantic compression
USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Large-scale Structure-from-Motion Reconstruction with small memory consumption
Proceedings of International Conference on Advances in Mobile Computing & Multimedia
Efficient top-k retrieval with signatures
Proceedings of the 18th Australasian Document Computing Symposium
Identifying useful human correction feedback from an on-line machine translation service
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Deciding unique decodability of bigram counts via finite automata
Journal of Computer and System Sciences
Ranking consistency for image matching and object retrieval
Pattern Recognition
Information Sciences: an International Journal
Dimension independent similarity computation
The Journal of Machine Learning Research
Efficient estimation for high similarities using odd sketches
Proceedings of the 23rd international conference on World wide web
CoBAn: A context based model for data leakage prevention
Information Sciences: an International Journal
Object-based visual query suggestion
Multimedia Tools and Applications
Non-uniformity issues and workarounds in bounded-size sampling
The VLDB Journal — The International Journal on Very Large Data Bases
Migratory compression: coarse-grained data reordering to improve compressibility
FAST'14 Proceedings of the 12th USENIX conference on File and Storage Technologies
EsPRESSO: Efficient privacy-preserving evaluation of sample set similarity
Journal of Computer Security
Hi-index | 0.02 |