Generating finite-state transducers for semi-structured data extraction from the Web
Information Systems - Special issue on semistructured data
Materialized views: techniques, implementations, and applications
Materialized views: techniques, implementations, and applications
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
ACM SIGMOD Record
Building Light-Weight Wrappers for Legacy Web Data-Sources Using W4F
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Visual Web Information Extraction with Lixto
Proceedings of the 27th International Conference on Very Large Data Bases
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
A survey of approaches to automatic schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
Knowledge Representation and Reasoning
Knowledge Representation and Reasoning
Web-scale information extraction in knowitall: (preliminary results)
Proceedings of the 13th international conference on World Wide Web
Integrating Data from Disparate Sources: A Mass Collaboration Approach
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Robust Identification of Fuzzy Duplicates
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Reference reconciliation in complex information spaces
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Information Extraction: Distilling Structured Data from Unstructured Text
Queue - Social Computing
Principles of dataspace systems
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Managing information extraction: state of the art and research directions
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
ULDBs: databases with uncertainty and lineage
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Creating probabilistic databases from information extraction models
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Entity Resolution with Markov Logic
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data
The Journal of Machine Learning Research
Collective entity resolution in relational data
ACM Transactions on Knowledge Discovery from Data (TKDD)
Self-taught learning: transfer learning from unlabeled data
Proceedings of the 24th international conference on Machine learning
ESTER: efficient search on text, entities, and relations
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient query evaluation on probabilistic databases
The VLDB Journal — The International Journal on Very Large Data Bases
Eliminating fuzzy duplicates in data warehouses
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Computer
Building structured web community portals: a top-down, compositional, and incremental approach
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Report on the Principles of Provenance Workshop
ACM SIGMOD Record
Information integration in the enterprise
Communications of the ACM - Enterprise information integration: and other tools for merging data
Unsupervised deduplication using cross-field dependencies
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
WebTables: exploring the power of tables on the web
Proceedings of the VLDB Endowment
Efficient Information Extraction over Evolving Text Data
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Robust web extraction: an approach based on a probabilistic tree-edit model
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
First-order query rewriting for inconsistent databases
ICDT'05 Proceedings of the 10th international conference on Database Theory
Enabling entity-based aggregators for web 2.0 data
Proceedings of the 19th international conference on World wide web
Lineage processing over correlated probabilistic databases
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Midas: integrating public financial data
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
On-the-fly entity-aware query processing in the presence of linkage
Proceedings of the VLDB Endowment
Towards the web of concepts: extracting concepts from large datasets
Proceedings of the VLDB Endowment
ICOODB'10 Proceedings of the Third international conference on Objects and databases
Human-assisted graph search: it's okay to ask questions
Proceedings of the VLDB Endowment
Highly efficient algorithms for structural clustering of large websites
Proceedings of the 20th international conference on World wide web
The new frontier of web search technology: seven challenges
Search computing
Search computing
Search computing
"All-about" diaries: concepts and experiences
Proceedings of the 5th International Conference on Communication System Software and Middleware
Finding relevant information of certain types from enterprise data
Proceedings of the 20th ACM international conference on Information and knowledge management
Supporting queries spanning across phases of evolving artifacts using Steiner forests
Proceedings of the 20th ACM international conference on Information and knowledge management
Chapter 2: next generation web search
Search Computing
An analysis of structured data on the web
Proceedings of the VLDB Endowment
Active objects: actions for entity-centric search
Proceedings of the 21st international conference on World Wide Web
Automatic web-scale information extraction
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Real-time population of knowledge bases: opportunities and challenges
AKBC-WEKEX '12 Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction
Towards web-scale structured web data extraction
Proceedings of the sixth ACM international conference on Web search and data mining
HIL: a high-level scripting language for entity integration
Proceedings of the 16th International Conference on Extending Database Technology
A bottom-up, knowledge-aware approach to integrating and querying web data services
ACM Transactions on the Web (TWEB)
Identifying salient entities in web pages
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
WOO: a scalable and multi-tenant platform for continuous knowledge base synthesis
Proceedings of the VLDB Endowment
Exploratory search framework for Web data sources
The VLDB Journal — The International Journal on Very Large Data Bases
WOOster: a map-reduce based platform for graph mining
Proceedings of the 17th International Conference on Management of Data
Entity ranking using click-log information
Intelligent Data Analysis
Hi-index | 0.00 |
We make the case for developing a web of concepts by starting with the current view of web (comprised of hyperlinked pages, or documents, each seen as a bag of words), extracting concept-centric metadata, and stitching it together to create a semantically rich aggregate view of all the information available on the web for each concept instance. The goal of building and maintaining such a web of concepts presents many challenges, but also offers the promise of enabling many powerful applications, including novel search and information discovery paradigms. We present the goal, motivate it with example usage scenarios and some analysis of Yahoo! logs, and discuss the challenges in building and leveraging such a web of concepts. We place this ambitious research agenda in the context of the state of the art in the literature, and describe various ongoing efforts at Yahoo! Research that are related.