Hybrid entity clustering using crowds and data

Authors:
Jongwuk Lee;Hyunsouk Cho;Jin-Woo Park;Young-Rok Cha;Seung-Won Hwang;Zaiqing Nie;Ji-Rong Wen
Affiliations:
Pohang University of Science and Technology (POSTECH), Pohang, Republic of Korea;Pohang University of Science and Technology (POSTECH), Pohang, Republic of Korea;Pohang University of Science and Technology (POSTECH), Pohang, Republic of Korea;Pohang University of Science and Technology (POSTECH), Pohang, Republic of Korea;Pohang University of Science and Technology (POSTECH), Pohang, Republic of Korea;Microsoft Research Asia, Beijing, People's Republic of China;Renmin University of China, Beijing, People's Republic of China
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
2013

Citing 43
Cited 0

Reexamining the cluster hypothesis: scatter/gather on retrieval results

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Integration of heterogeneous databases without common domains using queries based on textual similarity

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Web document clustering: a feasibility demonstration

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Entropy-based subspace clustering for mining numerical data

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Finding generalized projected clusters in high dimensional spaces

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
A human-computer cooperative system for effective high dimensional clustering

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Constrained K-means Clustering with Background Knowledge

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Document clustering based on non-negative matrix factorization

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Subspace clustering for high dimensional data: a review

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Learning to cluster web search results

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
A probabilistic framework for semi-supervised clustering

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
HARP: A Practical Projected Clustering Algorithm

IEEE Transactions on Knowledge and Data Engineering
Grouping web image search result

Proceedings of the 12th annual ACM international conference on Multimedia
On Discovery of Extremely Low-Dimensional Clusters Using Semi-Supervised Projected Clustering

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Object-level ranking: bringing order to Web objects

WWW '05 Proceedings of the 14th international conference on World Wide Web
Adaptive Product Normalization: Using Online Learning for Record Linkage in Comparison Shopping

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Comparing Subspace Clusterings

IEEE Transactions on Knowledge and Data Engineering
Identifying comparative sentences in text documents

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Document clustering with prior knowledge

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Improving personalized web search using result diversification

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Web object retrieval

Proceedings of the 16th international conference on World Wide Web
A new algorithm for clustering search results

Data & Knowledge Engineering
Learn from web search logs to organize search results

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Eliminating fuzzy duplicates in data warehouses

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Spectral geometry for simultaneously clustering and ranking query search results

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Diversifying search results

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Query result clustering for object-level search

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Scale Invariant Feature Transform with Irregular Orientation Histogram Binning

ICIAR '09 Proceedings of the 6th International Conference on Image Analysis and Recognition
Probabilistic models of ranking novel documents for faceted topic retrieval

Proceedings of the 18th ACM conference on Information and knowledge management
Generic entity resolution with negative rules

The VLDB Journal — The International Journal on Very Large Data Bases
A Conceptual Model for a Web-Scale Entity Name System

ASWC '09 Proceedings of the 4th Asian Conference on The Semantic Web
On active learning of record matching packages

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Comparable entity mining from comparative questions

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Finding the Jaccard median

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Open entity extraction from web search query logs

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Crowdsourcing systems on the World-Wide Web

Communications of the ACM
CrowdDB: answering queries with crowdsourcing

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Pushing the boundaries of crowd-enabled databases with query-driven schema expansion

Proceedings of the VLDB Endowment
Short text conceptualization using a probabilistic knowledgebase

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
CrowdER: crowdsourcing entity resolution

Proceedings of the VLDB Endowment
Question selection for crowd entity resolution

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Query result clustering has attracted considerable attention as a means of providing users with a concise overview of results. However, little research effort has been devoted to organizing the query results for entities which refer to real-world concepts, e.g., people, products, and locations. Entity-level result clustering is more challenging because diverse similarity notions between entities need to be supported in heterogeneous domains, e.g., image resolution is an important feature for cameras, but not for fruits. To address this challenge, we propose a hybrid relationship clustering algorithm, called Hydra, using co-occurrence and numeric features. Algorithm Hydra captures diverse user perceptions from co-occurrence and disambiguates different senses using feature-based similarity. In addition, we extend Hydra into $${\mathsf{Hydra }_\mathsf{gData }}$$HydragData with different sources, i.e., entity types and crowdsourcing. Experimental results show that the proposed algorithms achieve effectiveness and efficiency in real-life and synthetic datasets.