Multilevel k-way hypergraph partitioning
Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
Approximate String Joins in a Database (Almost) for Free
Proceedings of the 27th International Conference on Very Large Data Bases
Robust and efficient fuzzy match for online data cleaning
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Efficient set joins on similarity predicates
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Efficient exact set-similarity joins
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Yago: a core of semantic knowledge
Proceedings of the 16th international conference on World Wide Web
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Efficient similarity joins for near duplicate detection
Proceedings of the 17th international conference on World Wide Web
Freebase: a collaboratively created graph database for structuring human knowledge
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
WebTables: exploring the power of tables on the web
Proceedings of the VLDB Endowment
Ed-Join: an efficient algorithm for similarity joins with edit distance constraints
Proceedings of the VLDB Endowment
RDF123: From Spreadsheets to RDF
ISWC '08 Proceedings of the 7th International Conference on The Semantic Web
Fuzzy Annotation of Web Data Tables Driven by a Domain Ontology
ESWC 2009 Heraklion Proceedings of the 6th European Semantic Web Conference on The Semantic Web: Research and Applications
Answering table augmentation queries from unstructured lists on the web
Proceedings of the VLDB Endowment
Harvesting relational tables from lists on the web
Proceedings of the VLDB Endowment
DBpedia: a nucleus for a web of open data
ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
Efficient parallel set-similarity joins using MapReduce
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Trie-join: efficient trie-based string similarity joins with edit-distance constraints
Proceedings of the VLDB Endowment
Annotating and searching web tables using entities, types and relationships
Proceedings of the VLDB Endowment
Converting and annotating quantitative data tables
ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part I
ITEM: extract and integrate entities from tabular data to RDF knowledge base
APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
Recovering semantics of tables on the web
Proceedings of the VLDB Endowment
Fast-join: An efficient method for fuzzy token matching based string similarity join
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Pass-join: a partition-based method for similarity joins
Proceedings of the VLDB Endowment
V-SMART-join: a scalable mapreduce framework for all-pair similarity joins of multisets and vectors
Proceedings of the VLDB Endowment
Can we beat the prefix filtering?: an adaptive framework for similarity join and search
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
InfoGather: entity augmentation and attribute discovery by holistic matching with web tables
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Probase: a probabilistic taxonomy for text understanding
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Towards a high quality and web-scalable table search engine
KEYS '12 Proceedings of the Third International Workshop on Keyword Search on Structured Data
Parallel Top-K Similarity Join Algorithms Using MapReduce
ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Viewing the Web as a Distributed Knowledge Base
ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Answering table queries on the web using column keywords
Proceedings of the VLDB Endowment
Understanding tables on the web
ER'12 Proceedings of the 31st international conference on Conceptual Modeling
Entity discovery and annotation in tables
Proceedings of the 16th International Conference on Extending Database Technology
Hi-index | 0.00 |
Tabular data on the Web has become a rich source of structured data that is useful for ordinary users to explore. Due to its potential, tables on the Web have recently attracted a number of studies with the goals of understanding the semantics of those Web tables and providing effective search and exploration mechanisms over them. An important part of table understanding and search is column concept determination, i.e., identifying the most appropriate concepts associated with the columns of the tables. The problem becomes especially challenging with the availability of increasingly rich knowledge bases that contain hundreds of millions of entities. In this paper, we focus on an important instantiation of the column concept determination problem, namely, the concepts of a column are determined by fuzzy matching its cell values to the entities within a large knowledge base. We provide an efficient and scalable MapReduce-based solution that is scalable to both the number of tables and the size of the knowledge base and propose two novel techniques: knowledge concept aggregation and knowledge entity partition. We prove that both the problem of finding the optimal aggregation strategy and that of finding the optimal partition strategy are NP-hard, and propose efficient heuristic techniques by leveraging the hierarchy of the knowledge base. Experimental results on real-world datasets show that our method achieves high annotation quality and performance, and scales well.