Probabilistic reasoning in intelligent systems: networks of plausible inference
Probabilistic reasoning in intelligent systems: networks of plausible inference
Copy detection mechanisms for digital documents
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Modern Information Retrieval
Detecting similar documents using salient terms
Proceedings of the eleventh international conference on Information and knowledge management
Using Dempster-Shafer's Theory of Evidence to Combine Aspects of Information Use
Journal of Intelligent Information Systems
Finding the Most Similar Documents across Multiple Text Databases
ADL '99 Proceedings of the IEEE Forum on Research and Technology Advances in Digital Libraries
Finding similar files in a large file system
WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
Retrieving similar documents from the web
Journal of Web Engineering
A sentence-based copy detection approach for web documents
FSKD'05 Proceedings of the Second international conference on Fuzzy Systems and Knowledge Discovery - Volume Part I
Using word similarity to eradicate junk emails
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
SimPaD: A word-similarity sentence-based plagiarism detection tool on Web documents
Web Intelligence and Agent Systems
An unsupervised sentiment classifier on summarized or full reviews
WISE'10 Proceedings of the 11th international conference on Web information systems engineering
A community question-answering refinement system
Proceedings of the 22nd ACM conference on Hypertext and hypermedia
Personalized book recommendations created by using social media data
WISS'10 Proceedings of the 2010 international conference on Web information systems engineering
WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
A query-based multi-document sentiment summarizer
Proceedings of the 20th ACM international conference on Information and knowledge management
A personalized recommendation system on scholarly publications
Proceedings of the 20th ACM international conference on Information and knowledge management
Generating exact- and ranked partially-matched answers to questions in advertisements
Proceedings of the VLDB Endowment
Predicting the ratings of multimedia items for making personalized recommendations
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
BReK12: a book recommender for K-12 users
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Journal of Intelligent Information Systems
A group recommender for movies based on content similarity and popularity
Information Processing and Management: an International Journal
What to read next?: making personalized book recommendations for K-12 users
Proceedings of the 7th ACM conference on Recommender systems
Hi-index | 0.00 |
It is relatively easy to detect exact matches in Web documents; however, detecting similar content in distinct Web documents with different words and sentence structures is a much more difficult task. A reliable tool for determining the degree of similarity between any two Web documents could help filter or retain Web documents with similar content. Most methods for detecting similarity between documents rely on some kind of textual fingerprinting or a process of looking for exactly matched substrings. This may not be sufficient as changing the sentence structure or replacing words with synonyms can cause sentences with similar/same content to be treated as different. In this paper, we develop a sentence-based Fuzzy Set Information Retrieval (IR) approach, using word clusters that capture the similarity between different words for discovering similar documents. Our approach has the advantages of detecting documents with similar, but not necessarily the same, sentences based on fuzzy-word sets. The three different fuzzy-word clustering techniques that we have considered include the correlation cluster, the association cluster, and the metric cluster, which generate the word-to-word correlation values. Experimental results show that by adopting the metric cluster, our similarity detection approach has high accurate rate in detecting similar documents and improves previous Fuzzy Set IR approaches based solely on the correlation cluster.