Measuring semantic similarity between words by removing noise and redundancy in web snippets

Authors:
Zheng Xu;Xiangfeng Luo;Jie Yu;Weimin Xu
Affiliations:
School of Computer Engineering and Science, High Performance Computing Center, Shanghai University, Shanghai, 200072, China;School of Computer Engineering and Science, High Performance Computing Center, Shanghai University, Shanghai, 200072, China;School of Computer Engineering and Science, High Performance Computing Center, Shanghai University, Shanghai, 200072, China;School of Computer Engineering and Science, High Performance Computing Center, Shanghai University, Shanghai, 200072, China
Venue:
Concurrency and Computation: Practice & Experience
Year:
2011

Citing 23
Cited 2

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Word sense disambiguation for free-text indexing using a massive semantic network

CIKM '93 Proceedings of the second international conference on Information and knowledge management
Content-Based Image Retrieval at the End of the Early Years

IEEE Transactions on Pattern Analysis and Machine Intelligence
Contextual correlates of synonymy

Communications of the ACM
Intelligent Indexing and Semantic Retrieval of Multimodal Documents

Information Retrieval
Building Hypertext Links By Computing Semantic Similarity

IEEE Transactions on Knowledge and Data Engineering
An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources

IEEE Transactions on Knowledge and Data Engineering
Simple Semantics in Topic Detection and Tracking

Information Retrieval
The Web as a parallel corpus

Computational Linguistics - Special issue on web as corpus
Word association norms, mutual information, and lexicography

ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics
Towards the self-annotating web

Proceedings of the 13th international conference on World Wide Web
Semantic Similarity Search on Semistructured Data with the XXL Search Engine

Information Retrieval
A web-based kernel function for measuring the similarity of short text snippets

Proceedings of the 15th international conference on World Wide Web
Novel association measures using web search with double checking

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Scaling up all pairs similarity search

Proceedings of the 16th international conference on World Wide Web
Measuring semantic similarity between words using web search engines

Proceedings of the 16th international conference on World Wide Web
Efficient similarity joins for near duplicate detection

Proceedings of the 17th international conference on World Wide Web
Discovery of textual knowledge flow based on the management of knowledge maps

Concurrency and Computation: Practice & Experience - 2nd International Workshop on Workflow Management and Applications in Grid Environments (WaGe2007)
Measuring Knowledge Delivery Quantity of Associated Knowledge Flow

SKG '08 Proceedings of the 2008 Fourth International Conference on Semantics, Knowledge and Grid
Ranking and Suggesting Popular Items

IEEE Transactions on Knowledge and Data Engineering
WordNet::Similarity: measuring the relatedness of concepts

HLT-NAACL--Demonstrations '04 Demonstration Papers at HLT-NAACL 2004
Generation of similarity knowledge flow for intelligent browsing based on semantic link networks

Concurrency and Computation: Practice & Experience - Special Issue: 3rd International Workshop on Workflow Management and Applications in Grid Environments (WaGe2008)
Measuring semantic relatedness with vector space models and random walks

TextGraphs-4 Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing

PMING Distance: A Collaborative Semantic Proximity Measure

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 02
Creating a semantically-enhanced cloud services environment through ontology evolution

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Semantic similarity measures play important roles in many Web-related tasks such as Web browsing and query suggestion. Because taxonomy-based methods can not deal with continually emerging words, recently Web-based methods have been proposed to solve this problem. Because of the noise and redundancy hidden in the Web data, robustness and accuracy are still challenges. In this paper, we propose a method integrating page counts and snippets returned by Web search engines. Then, the semantic snippets and the number of search results are used to remove noise and redundancy in the Web snippets (‘Web-snippet’ includes the title, summary, and URL of a Web page returned by a search engine). After that, a method integrating page counts, semantics snippets, and the number of already displayed search results are proposed. The proposed method does not need any human annotated knowledge (e.g., ontologies), and can be applied Web-related tasks (e.g., query suggestion) easily. A correlation coefficient of 0.851 against Rubenstein–Goodenough benchmark dataset shows that the proposed method outperforms the existing Web-based methods by a wide margin. Moreover, the proposed semantic similarity measure significantly improves the quality of query suggestion against some page counts based methods. Copyright © 2011 John Wiley & Sons, Ltd.