Disambiguating identity web references using Web 2.0 data and semantics

Authors:
Matthew Rowe;Fabio Ciravegna
Affiliations:
The OAK Group, Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield S1 4DP, UK;The OAK Group, Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield S1 4DP, UK
Venue:
Web Semantics: Science, Services and Agents on the World Wide Web
Year:
2010

Citing 24
Cited 3

Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Information Retrieval

Information Retrieval
Machine Learning

Machine Learning
PEBL: Web Page Classification without Negative Examples

IEEE Transactions on Knowledge and Data Engineering
Disambiguating Web appearances of people in a social network

WWW '05 Proceedings of the 14th international conference on World Wide Web
Reference reconciliation in complex information spaces

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Person resolution in person search results: WebHawk

Proceedings of the 14th ACM international conference on Information and knowledge management
Text Classification without Negative Examples Revisit

IEEE Transactions on Knowledge and Data Engineering
Microformats: The Next (Small) Thing on the Semantic Web?

IEEE Internet Computing
A bootstrapping method for learning semantic lexicons using extraction pattern contexts

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Graph-based text classification: learn from your neighbors

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A Graph-Theoretic Approach to Enterprise Network Dynamics (Progress in Computer Science and Applied Logic (PCS))

A Graph-Theoretic Approach to Enterprise Network Dynamics (Progress in Computer Science and Applied Logic (PCS))
Retrieving and Matching RDF Graphs by Solving the Satisfiability Problem

WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Efficient topic-based unsupervised name disambiguation

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Random walks on the click graph

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Clustering and Embedding Using Commute Times

IEEE Transactions on Pattern Analysis and Machine Intelligence
Towards breaking the quality curse.: a web-querying approach to web people search.

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Exploring the facebook experience: a new approach to usability

Proceedings of the 5th Nordic conference on Human-computer interaction: building bridges
idMesh: graph-based disambiguation of linked data

Proceedings of the 18th international conference on World wide web
WIT: web people search disambiguation using random walks

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
PORE: positive-only relation extraction from wikipedia text

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
Instance based clustering of semantic web resources

ESWC'08 Proceedings of the 5th European semantic web conference on The semantic web: research and applications
Integrating transportation ontologies using semantic web languages

HoloMAS'05 Proceedings of the Second international conference on Holonic and Multi-Agent Systems for Manufacturing

Harnessing different knowledge sources to measure semantic relatedness under a uniform model

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
More than modelling and hiding: towards a comprehensive view of Web mining and privacy

Data Mining and Knowledge Discovery
A unified approach to matching semantic data on the Web

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

As web users disseminate more of their personal information on the web, the possibility of these users becoming victims of lateral surveillance and identity theft increases. Therefore web resources containing this personal information, which we refer to as identity web references must be found and disambiguated to produce a unary set of web resources which refer to a given person. Such is the scale of the web that forcing web users to monitor their identity web references is not feasible, therefore automated approaches are required. However, automated approaches require background knowledge about the person whose identity web references are to be disambiguated. Within this paper we present a detailed approach to monitor the web presence of a given individual by obtaining background knowledge from Web 2.0 platforms to support automated disambiguation processes. We present a methodology for generating this background knowledge by exporting data from multiple Web 2.0 platforms as RDF data models and combining these models together for use as seed data. We present two disambiguation techniques; the first using a semi-supervised machine learning technique known as Self-training and the second using a graph-based technique known as Random Walks, we explain how the semantics of data supports the intrinsic functionalities of these techniques. We compare the performance of our presented disambiguation techniques against several baseline measures including human processing of the same data. We achieve an average precision level of 0.935 for Self-training and an average f-measure level of 0.705 for Random Walks in both cases outperforming several baselines measures.