The merge/purge problem for large databases
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Approximate String Joins in a Database (Almost) for Free
Proceedings of the 27th International Conference on Very Large Data Bases
Efficient Record Linkage in Large Data Sets
DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
Record linkage: similarity measures and algorithms
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Recrawl scheduling based on information longevity
Proceedings of the 17th international conference on World Wide Web
Video suggestion and discovery for youtube: taking random walks through the view graph
Proceedings of the 17th international conference on World Wide Web
SQAK: doing more with keywords
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Identification of time-varying objects on the web
Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
Summarization system evaluation revisited: N-gram graphs
ACM Transactions on Speech and Language Processing (TSLP)
The web changes everything: understanding the dynamics of web content
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Searching for events in the blogosphere
Proceedings of the 18th international conference on World wide web
Entity resolution with iterative blocking
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Combining keyword search and forms for ad hoc querying of databases
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Robust record linkage blocking using suffix arrays
Proceedings of the 18th ACM conference on Information and knowledge management
The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
Towards recency ranking in web search
Proceedings of the third ACM international conference on Web search and data mining
What is Twitter, a social network or a news media?
Proceedings of the 19th international conference on World wide web
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Evaluating evidences for keyword query disambiguation in entity centric database search
DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part II
Efficient entity resolution for large heterogeneous information spaces
Proceedings of the fourth ACM international conference on Web search and data mining
Hi-index | 0.00 |
Individuals contribute content on the Web at an unprecedented rate, accumulating immense quantities of (semi-)structured data. Wisdom of the Crowds theory advocates that such information (or parts of it) is constantly overwritten, updated, or even deleted by other users, with the goal of rendering it more accurate, or up-to-date. This is particularly true for the collaboratively edited, semi-structured data of entity repositories, whose entity profiles are consistently kept fresh. Therefore, their core information that remain stable with the passage of time, despite being reviewed by numerous users, are particularly useful for the description of an entity. Based on the above hypothesis, we introduce a classification scheme that predicts, on the basis of statistical and content patterns, whether an attribute (i.e., name-value pair) is going to be modified in the future. We apply our scheme on a large, real-world, versioned dataset and verify its effectiveness. Our thorough experimental study also suggests that reducing entity profiles to their stable parts conveys significant benefits to two common tasks in computer science: information retrieval and information integration.