A web of concepts

Authors:
Nilesh Dalvi;Ravi Kumar;Bo Pang;Raghu Ramakrishnan;Andrew Tomkins;Philip Bohannon;Sathiya Keerthi;Srujana Merugu
Affiliations:
Yahoo! Research, Sunnyvale, CA, USA;Yahoo! Research, Sunnyvale, CA, USA;Yahoo! Research, Sunnyvale, CA, USA;Yahoo! Research, Sunnyvale, CA, USA;Yahoo! Research, Sunnyvale, CA, USA;Yahoo! Research, Sunnyvale, CA, USA;Yahoo! Research, Sunnyvale, CA, USA;Yahoo! Research, Sunnyvale, CA, USA
Venue:
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Year:
2009

Citing 34
Cited 27

Generating finite-state transducers for semi-structured data extraction from the Web

Information Systems - Special issue on semistructured data
Materialized views: techniques, implementations, and applications

Materialized views: techniques, implementations, and applications
A guided tour to approximate string matching

ACM Computing Surveys (CSUR)
Wrapping web data into XML

ACM SIGMOD Record
Building Light-Weight Wrappers for Legacy Web Data-Sources Using W4F

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Visual Web Information Extraction with Lixto

Proceedings of the 27th International Conference on Very Large Data Bases
RoadRunner: Towards Automatic Data Extraction from Large Web Sites

Proceedings of the 27th International Conference on Very Large Data Bases
A survey of approaches to automatic schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
Knowledge Representation and Reasoning

Knowledge Representation and Reasoning
Web-scale information extraction in knowitall: (preliminary results)

Proceedings of the 13th international conference on World Wide Web
Integrating Data from Disparate Sources: A Mass Collaboration Approach

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Robust Identification of Fuzzy Duplicates

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Reference reconciliation in complex information spaces

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Information Extraction: Distilling Structured Data from Unstructured Text

Queue - Social Computing
Principles of dataspace systems

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Managing information extraction: state of the art and research directions

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
ULDBs: databases with uncertainty and lineage

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Creating probabilistic databases from information extraction models

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Entity Resolution with Markov Logic

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data

The Journal of Machine Learning Research
Collective entity resolution in relational data

ACM Transactions on Knowledge Discovery from Data (TKDD)
Self-taught learning: transfer learning from unlabeled data

Proceedings of the 24th international conference on Machine learning
ESTER: efficient search on text, entities, and relations

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient query evaluation on probabilistic databases

The VLDB Journal — The International Journal on Very Large Data Bases
Eliminating fuzzy duplicates in data warehouses

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Toward a PeopleWeb

Computer
Building structured web community portals: a top-down, compositional, and incremental approach

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Report on the Principles of Provenance Workshop

ACM SIGMOD Record
Information integration in the enterprise

Communications of the ACM - Enterprise information integration: and other tools for merging data
Unsupervised deduplication using cross-field dependencies

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
WebTables: exploring the power of tables on the web

Proceedings of the VLDB Endowment
Efficient Information Extraction over Evolving Text Data

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Robust web extraction: an approach based on a probabilistic tree-edit model

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
First-order query rewriting for inconsistent databases

ICDT'05 Proceedings of the 10th international conference on Database Theory

Enabling entity-based aggregators for web 2.0 data

Proceedings of the 19th international conference on World wide web
Lineage processing over correlated probabilistic databases

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Midas: integrating public financial data

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
On-the-fly entity-aware query processing in the presence of linkage

Proceedings of the VLDB Endowment
Towards the web of concepts: extracting concepts from large datasets

Proceedings of the VLDB Endowment
Searching the web of objects

ICOODB'10 Proceedings of the Third international conference on Objects and databases
Human-assisted graph search: it's okay to ask questions

Proceedings of the VLDB Endowment
Highly efficient algorithms for structural clustering of large websites

Proceedings of the 20th international conference on World wide web
The new frontier of web search technology: seven challenges

Search computing
Trends in search interaction

Search computing
Semantic resource framework

Search computing
"All-about" diaries: concepts and experiences

Proceedings of the 5th International Conference on Communication System Software and Middleware
Finding relevant information of certain types from enterprise data

Proceedings of the 20th ACM international conference on Information and knowledge management
Supporting queries spanning across phases of evolving artifacts using Steiner forests

Proceedings of the 20th ACM international conference on Information and knowledge management
Chapter 2: next generation web search

Search Computing
An analysis of structured data on the web

Proceedings of the VLDB Endowment
Active objects: actions for entity-centric search

Proceedings of the 21st international conference on World Wide Web
Automatic web-scale information extraction

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Real-time population of knowledge bases: opportunities and challenges

AKBC-WEKEX '12 Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction
Towards web-scale structured web data extraction

Proceedings of the sixth ACM international conference on Web search and data mining
HIL: a high-level scripting language for entity integration

Proceedings of the 16th International Conference on Extending Database Technology
A bottom-up, knowledge-aware approach to integrating and querying web data services

ACM Transactions on the Web (TWEB)
Identifying salient entities in web pages

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
WOO: a scalable and multi-tenant platform for continuous knowledge base synthesis

Proceedings of the VLDB Endowment
Exploratory search framework for Web data sources

The VLDB Journal — The International Journal on Very Large Data Bases
WOOster: a map-reduce based platform for graph mining

Proceedings of the 17th International Conference on Management of Data
Entity ranking using click-log information

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

We make the case for developing a web of concepts by starting with the current view of web (comprised of hyperlinked pages, or documents, each seen as a bag of words), extracting concept-centric metadata, and stitching it together to create a semantically rich aggregate view of all the information available on the web for each concept instance. The goal of building and maintaining such a web of concepts presents many challenges, but also offers the promise of enabling many powerful applications, including novel search and information discovery paradigms. We present the goal, motivate it with example usage scenarios and some analysis of Yahoo! logs, and discuss the challenges in building and leveraging such a web of concepts. We place this ambitious research agenda in the context of the state of the art in the literature, and describe various ongoing efforts at Yahoo! Research that are related.