User Behaviors in Related Word Retrieval and New Word Detection: A Collaborative Perspective

Authors:
Zhiyuan Liu;Yabin Zheng;Lixing Xie;Maosong Sun;Liyun Ru;Yang Zhang
Affiliations:
State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University;State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University;State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University;State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University;State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University;Sohu Inc. R&D Center
Venue:
ACM Transactions on Asian Language Information Processing (TALIP)
Year:
2011

Citing 36
Cited 0

Word association norms, mutual information, and lexicography

Computational Linguistics
A vector space model for automatic indexing

Communications of the ACM
Collaborative Filtering Using Weighted Majority Prediction Algorithms

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Collaborative filtering via gaussian probabilistic latent semantic analysis

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
Item-based top-N recommendation algorithms

ACM Transactions on Information Systems (TOIS)
Retrieval evaluation with incomplete information

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions

IEEE Transactions on Knowledge and Data Engineering
Automatic recognition of Chinese unknown words based on roles tagging

SIGHAN '02 Proceedings of the first SIGHAN workshop on Chinese language processing - Volume 18
The first international Chinese word segmentation Bakeoff

SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
A web-based kernel function for measuring the similarity of short text snippets

Proceedings of the 15th international conference on World Wide Web
Generating query substitutions

Proceedings of the 15th international conference on World Wide Web
Querying the web: a multiontology disambiguation method

ICWE '06 Proceedings of the 6th international conference on Web engineering
Paraphrasing with bilingual parallel corpora

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Names and similarities on the web: fact extraction in the fast lane

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Chinese segmentation and new word detection using conditional random fields

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Measuring semantic similarity between words using web search engines

Proceedings of the 16th international conference on World Wide Web
The Google Similarity Distance

IEEE Transactions on Knowledge and Data Engineering
Weakly-supervised discovery of named entities using web search queries

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
"More like these": growing entity classes from seeds

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Language-Independent Set Expansion of Named Entities Using the Web

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Iterative Set Expansion of Named Entities Using the Web

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Translating queries into snippets for improved query expansion

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Learning to Rank for Information Retrieval

Foundations and Trends in Information Retrieval
Improving similarity measures for short segments of text

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Finding cars, goddesses and enzymes: parametrizable acquisition of labeled instances for open-domain information extraction

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Unsupervised named-entity extraction from the Web: An experimental study

Artificial Intelligence
Incorporating user behaviors in new word detection

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Automatic set instance extraction using the web

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Web-scale distributional similarity and entity set expansion

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Similarity measures for short segments of text

ECIR'07 Proceedings of the 29th European conference on IR research
Tree edit models for recognizing textual entailments, paraphrases, and answers to questions

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Growing related words from seed via user behaviors: a re-ranking based approach

ACLstudent '10 Proceedings of the ACL 2010 Student Research Workshop
Unsupervised Semantic Similarity Computation between Terms Using Web Documents

IEEE Transactions on Knowledge and Data Engineering
Empirical analysis of predictive algorithms for collaborative filtering

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
The use of SVM for chinese new word identification

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Nowadays, user behavior analysis and collaborative filtering have drawn a large body of research in the machine learning community. The goal is either to enhance the user experience or discover useful information hidden in the data. In this article, we conduct extensive experiments on a Chinese input method data set, which keeps the word lists that users have used. Then, from the collaborative perspective, we aim to solve two tasks in natural language processing, that is, related word retrieval and new word detection. Motivated by the observation that two words are usually highly related to each other if they co-occur frequently in users’ records, we propose a novel semantic relatedness measure between words that takes both user behaviors and collaborative filtering into consideration. We utilize this measure to perform related word retrieval and new word detection tasks. Experimental results on both tasks indicate the applicability and effectiveness of our method.