Normalized Cuts and Image Segmentation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Modern Information Retrieval
Organizing structured web sources by query schemas: a clustering approach
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Holistic Query Interface Matching using Parallel Schema Matching
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Yago: a core of semantic knowledge
Proceedings of the 16th international conference on World Wide Web
k-means++: the advantages of careful seeding
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Automatically refining the wikipedia infobox ontology
Proceedings of the 17th international conference on World Wide Web
The YAGO-NAGA approach to knowledge discovery
ACM SIGMOD Record
DBpedia - A crystallization point for the Web of Data
Web Semantics: Science, Services and Agents on the World Wide Web
Querying Wikipedia documents and relationships
Procceedings of the 13th International Workshop on the Web and Databases
Hi-index | 0.00 |
Wikipedia has emerged as an important source of structured information on the Web. But while the success of Wikipedia can be attributed in part to the simplicity of adding and modifying content, this has also created challenges when it comes to using, querying, and integrating the information. Even though authors are encouraged to select appropriate categories and provide infoboxes that follow pre-defined templates, many do not follow the guidelines or follow them loosely. This leads to undesirable effects, such as template duplication, heterogeneity, and schema drift. As a step towards addressing this problem, we propose a new unsupervised approach for clustering Wikipedia infoboxes. Instead of relying on manually assigned categories and template labels, we use the structured information available in infoboxes to group them and infer their entity types. Experiments using over 48,000 infoboxes indicate that our clustering approach is effective and produces high quality clusters.