SemTag and seeker: bootstrapping the semantic web via automated semantic annotation

Authors:
Stephen Dill;Nadav Eiron;David Gibson;Daniel Gruhl;R. Guha;Anant Jhingran;Tapas Kanungo;Sridhar Rajagopalan;Andrew Tomkins;John A. Tomlin;Jason Y. Zien
Affiliations:
IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA
Venue:
WWW '03 Proceedings of the 12th international conference on World Wide Web
Year:
2003

Citing 12
Cited 135

Applications of a Web query language

Selected papers from the sixth international conference on World Wide Web
Squeal: a structured query language for the Web

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
WebBase: a repository of Web pages

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Vinci: a service-oriented architecture for rapid development of web applications

Proceedings of the 10th international conference on World Wide Web
Annotea: an open RDF infrastructure for shared Web annotations

Proceedings of the 10th international conference on World Wide Web
Creating Semantic Web Contents with Protégé-2000

IEEE Intelligent Systems
Efficient Queries over Web Views

EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
MnM: Ontology Driven Semi-automatic and Automatic Support for Semantic Markup

EKAW '02 Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management. Ontologies and the Semantic Web
Algorithmic aspects of information retrieval on the web

Handbook of massive data sets
Corpus-based Techniques for Word Sense Disambiguation

Corpus-based Techniques for Word Sense Disambiguation
Automatic word sense discrimination

Computational Linguistics - Special issue on word sense disambiguation
From manual to semi-automatic semantic annotation: about ontology-based text annotation tools

Proceedings of the COLING-2000 Workshop on Semantic Annotation and Intelligent Content

Hearsay: enabling audio browsing on hypertext content

Proceedings of the 13th international conference on World Wide Web
Web-scale information extraction in knowitall: (preliminary results)

Proceedings of the 13th international conference on World Wide Web
Incremental formalization of document annotations through ontology-based paraphrasing

Proceedings of the 13th international conference on World Wide Web
Towards the self-annotating web

Proceedings of the 13th international conference on World Wide Web
Armadillo: harvesting information for the semantic web

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
The site browser: catalyzing improvements in hypertext organization

Proceedings of the fifteenth ACM conference on Hypertext and hypermedia
Surfing the web by site

Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
UIMA: an architectural approach to unstructured information processing in the corporate research environment

Natural Language Engineering
KIM – a semantic platform for information extraction and retrieval

Natural Language Engineering
Bootstrapping Semantic Annotation for Content-Rich HTML Documents

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Gimme' the context: context-driven automatic semantic annotation with C-PANKOW

WWW '05 Proceedings of the 14th international conference on World Wide Web
Browsing fatigue in handhelds: semantic bookmarking spells relief

WWW '05 Proceedings of the 14th international conference on World Wide Web
Survey of semantic annotation platforms

Proceedings of the 2005 ACM symposium on Applied computing
Ranking Complex Relationships on the Semantic Web

IEEE Internet Computing
The predictive power of online chatter

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Discovering large dense subgraphs in massive graphs

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Browsing for information by highlighting automatically generated annotations: a user study and evaluation

Proceedings of the 3rd international conference on Knowledge capture
Unsupervised named-entity extraction from the web: an experimental study

Artificial Intelligence
The web beyond popularity: a really simple system for web scale RSS

Proceedings of the 15th international conference on World Wide Web
Semantic analytics on social networks: experiences in addressing the problem of conflict of interest detection

Proceedings of the 15th international conference on World Wide Web
Exploring social annotations for the semantic web

Proceedings of the 15th international conference on World Wide Web
Model-directed web transactions under constrained modalities

Proceedings of the 15th international conference on World Wide Web
Optimizing scoring functions and indexes for proximity search in type-annotated corpora

Proceedings of the 15th international conference on World Wide Web
Checking content consistency of integrated web documents

Journal of Computer Science and Technology - Special section on China AVS standard
Ontologies as facilitators for repurposing web documents

International Journal of Human-Computer Studies
Hierarchical, perceptron-like learning for ontology-based information extraction

Proceedings of the 16th international conference on World Wide Web
P-TAG: large scale automatic generation of personalized annotation tags for the web

Proceedings of the 16th international conference on World Wide Web
Ontology based annotation of text segments

Proceedings of the 2007 ACM symposium on Applied computing
Combining classifiers for word sense disambiguation based on Dempster-Shafer theory and OWA operators

Data & Knowledge Engineering
ESTER: efficient search on text, entities, and relations

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Model-directed Web transactions under constrained modalities

ACM Transactions on the Web (TWEB)
AASA: a Method of Automatically Acquiring Semantic Annotations

Journal of Information Science
Magpie: Experiences in supporting Semantic Web browsing

Web Semantics: Science, Services and Agents on the World Wide Web
High performance index build algorithms for intranet search engines

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Autonomously semantifying wikipedia

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
EntityRank: searching entities directly and holistically

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Building structured web community portals: a top-down, compositional, and incremental approach

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Scalable semantic analytics on social networks for addressing the problem of conflict of interest detection

ACM Transactions on the Web (TWEB)
Discovering semantic biomedical relations utilizing the Web

ACM Transactions on Knowledge Discovery from Data (TKDD)
OnCU system: ontology-based category utility approach for author name disambiguation

Proceedings of the 2nd international conference on Ubiquitous information management and communication
OntoMiner: automated metadata and instance mining from news websites

International Journal of Web and Grid Services
TaxaMiner: an experimentation framework for automated taxonomy bootstrapping

International Journal of Web and Grid Services
Enabling ontology-based document classification and management in ebXML registries

Proceedings of the 2008 ACM symposium on Applied computing
Exploring social annotations for information retrieval

Proceedings of the 17th international conference on World Wide Web
Information extraction from Wikipedia: moving down the long tail

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Ontologies and the semantic web

Communications of the ACM - Surviving the data deluge
Ontology-driven, unsupervised instance population

Web Semantics: Science, Services and Agents on the World Wide Web
Efficient Content Creation on the Semantic Web Using Metadata Schemas with Domain Ontology Services (System Description)

ESWC '07 Proceedings of the 4th European conference on The Semantic Web: Research and Applications
Recommending Background Information and Related Content in Web 2.0 Portals

AH '08 Proceedings of the 5th international conference on Adaptive Hypermedia and Adaptive Web-Based Systems
Adopting ontologies for multisource identity resolution

OBI '08 Proceedings of the first international workshop on Ontology-supported business intelligence
Automated Semantic Analysis of Schematic Data

World Wide Web
Supporting the automatic construction of entity aware search engines

Proceedings of the 10th ACM workshop on Web information and data management
Word sense disambiguation: A survey

ACM Computing Surveys (CSUR)
Optimization issues in inverted index-based entity annotation

Proceedings of the 3rd international conference on Scalable information systems
Personalized recommendation of related content based on automatic metadata extraction

CASCON '08 Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds
Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning

ISWC '08 Proceedings of the 7th International Conference on The Semantic Web
A Robust Ontology-Based Method for Translating Natural Language Queries to Conceptual Graphs

ASWC '08 Proceedings of the 3rd Asian Semantic Web Conference on The Semantic Web
Advanced Information Retrieval

Electronic Notes in Theoretical Computer Science (ENTCS)
Multi-concept Document Classification Using a Perceptron-Like Algorithm

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Integrating Metadata Harvesting with Semantic Search

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Annotating Documents by Wikipedia Concepts

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Analysis of tag within online social networks

Proceedings of the ACM 2009 international conference on Supporting group work
Collective annotation of Wikipedia entities in web text

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Dynamic hypertext generation for reusing open corpus content

Proceedings of the 20th ACM conference on Hypertext and hypermedia
NLP Techniques for Term Extraction and Ontology Population

Proceedings of the 2008 conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge
QuestSemantics-Intelligent Search and Retrieval of Business Knowledge

Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Exploring models for semantic category verification

Information Systems
Exploring models for semantic category verification

Information Systems
An Approach to Web-Scale Named-Entity Disambiguation

MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
Boosting semantic web data access using Swoogle

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 4
On boosting semantic web data access

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 4
Automated ontology instantiation from tabular web sources-The AllRight system

Web Semantics: Science, Services and Agents on the World Wide Web
Unsupervised named-entity extraction from the Web: An experimental study

Artificial Intelligence
A framework for semantic link discovery over relational data

Proceedings of the 18th ACM conference on Information and knowledge management
Ontology-based automatic semantic annotation for named entity disambiguation

ISC '07 Proceedings of the 10th IASTED International Conference on Intelligent Systems and Control
SemCards: A New Representation for Realizing the Semantic Web

ICCCI '09 Proceedings of the 1st International Conference on Computational Collective Intelligence. Semantic Web, Social Networks and Multiagent Systems
SASL: A Semantic Annotation System for Literature

WISM '09 Proceedings of the International Conference on Web Information Systems and Mining
Semantic annotation, indexing, and retrieval

Web Semantics: Science, Services and Agents on the World Wide Web
Human Intelligence in the Process of Semantic Content Creation

World Wide Web
A multi-agent approach for generating ontologies and composing services into executable workflows

Proceedings of the 2010 EDBT/ICDT Workshops
Adaptively entropy-based weighting classifiers in combination using Dempster-Shafer theory for word sense disambiguation

Computer Speech and Language
Topic maps-based semblogging with semblog-tm

TMRA'06 Proceedings of the 2nd international conference on Topic maps research and applications
Ontological technologies for user modelling

International Journal of Metadata, Semantics and Ontologies
Graph-based concept identification and disambiguation for enterprise search

Proceedings of the 19th international conference on World wide web
Automatic extraction of clickable structured web contents for name entity queries

Proceedings of the 19th international conference on World wide web
ALLRIGHT: automatic ontology instantiation from tabular web documents

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
Entity-relationship queries over wikipedia

SMUC '10 Proceedings of the 2nd international workshop on Search and mining user-generated contents
Ontology-based understanding of natural language queries using nested conceptual graphs

ICCS'10 Proceedings of the 18th international conference on Conceptual structures: from information to intelligence
Annotating and searching web tables using entities, types and relationships

Proceedings of the VLDB Endowment
Smart qualitative data (SQUAD): information extraction in a large document archive

Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Tagging web product titles based on hidden Markov model

Proceedings of the 2011 ACM Symposium on Applied Computing
Auto-generation of multi-fielded domain-specific search forms

CIMMACS'05 Proceedings of the 4th WSEAS international conference on Computational intelligence, man-machine systems and cybernetics
The semantic web: from representation to realization

Transactions on computational collective intelligence II
From one tree to a forest: a unified solution for structured web data extraction

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Automatic semantic web annotation of named entities

Canadian AI'11 Proceedings of the 24th Canadian conference on Advances in artificial intelligence
Context-aware and multilingual information extraction for a tourist recommender system

i-KNOW '11 Proceedings of the 11th International Conference on Knowledge Management and Knowledge Technologies
Towards semantic category verification with arbitrary precision

ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
DBpedia spotlight: shedding light on the web of documents

Proceedings of the 7th International Conference on Semantic Systems
From names to entities using thematic context distance

Proceedings of the 20th ACM international conference on Information and knowledge management
Onto-Ann: an automatic and semantically rich annotation component for do-it-yourself assemblage

OTM'11 Proceedings of the 2011th Confederated international conference on On the move to meaningful internet systems
Mapping queries to the Linking Open Data cloud: A case study using DBpedia

Web Semantics: Science, Services and Agents on the World Wide Web
CMSA: a method for construction and maintenance of semantic annotations

ISPA'05 Proceedings of the 2005 international conference on Parallel and Distributed Processing and Applications
Dynamic content discovery, harvesting and delivery, from open corpus sources, for adaptive systems

AH'06 Proceedings of the 4th international conference on Adaptive Hypermedia and Adaptive Web-Based Systems
Metadata inference for document retrieval in a distributed repository

ASIAN'04 Proceedings of the 9th Asian Computing Science conference on Advances in Computer Science: dedicated to Jean-Louis Lassez on the Occasion of His 5th Cycle Birthday
An Ontology-driven Document Retrieval Strategy for Organizational Knowledge Management Systems

Electronic Notes in Theoretical Computer Science (ENTCS)
An approach to automatic ontology-based annotation of biomedical texts

IEA/AIE'06 Proceedings of the 19th international conference on Advances in Applied Artificial Intelligence: industrial, Engineering and Other Applications of Applied Intelligent Systems
Study on integrating semantic applications with magpie

AIMSA'06 Proceedings of the 12th international conference on Artificial Intelligence: methodology, Systems, and Applications
Improving web data annotations with spreading activation

WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
Semantic partitioning of web pages

WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
Adding semantics to microblog posts

Proceedings of the fifth ACM international conference on Web search and data mining
MultiCrawler: a pipelined architecture for crawling and indexing semantic web data

ISWC'06 Proceedings of the 5th international conference on The Semantic Web
Ontology-driven information extraction with ontosyphon

ISWC'06 Proceedings of the 5th international conference on The Semantic Web
Towards knowledge acquisition from information extraction

ISWC'06 Proceedings of the 5th international conference on The Semantic Web
Explaining conclusions from diverse knowledge sources

ISWC'06 Proceedings of the 5th international conference on The Semantic Web
Orchestration of semantic web services for large-scale document annotation

ESWC'05 Proceedings of the Second European conference on The Semantic Web: research and Applications
Virtual space ontologies for scripting agents

MMAS'04 Proceedings of the First international conference on Massively Multi-Agent Systems
gProt: annotating protein interactions using google and gene ontology

KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part III
Hierarchical topic term extraction for semantic annotation in chinese bulletin board system

ASWC'06 Proceedings of the First Asian conference on The Semantic Web
Ontology supported automatic generation of high-quality semantic metadata

ODBASE'06/OTM'06 Proceedings of the 2006 Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, DOA, GADA, and ODBASE - Volume Part I
Semantic annotation of biomedical literature using google

ICCSA'05 Proceedings of the 2005 international conference on Computational Science and Its Applications - Volume Part III
A semantic web portal for semantic annotation and search

KES'06 Proceedings of the 10th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part III
Emergent semantics from folksonomies: a quantitative study

Journal on Data Semantics VI
Evaluation of ontology enhancement tools

EWMF'05/KDO'05 Proceedings of the 2005 joint international conference on Semantics, Web and Mining
Compressed data structures for annotated web search

Proceedings of the 21st international conference on World Wide Web
Targeted disambiguation of ad-hoc, homogeneous sets of named entities

Proceedings of the 21st international conference on World Wide Web
Collective context-aware topic models for entity disambiguation

Proceedings of the 21st international conference on World Wide Web
Entity-Relationship Queries over Wikipedia

ACM Transactions on Intelligent Systems and Technology (TIST)
Inferring who-is-who in the Twitter social network

Proceedings of the 2012 ACM workshop on Workshop on online social networks
Inferring who-is-who in the Twitter social network

ACM SIGCOMM Computer Communication Review - Special october issue SIGCOMM '12
Can we use linked data semantic annotators for the extraction of domain-relevant expressions?

Proceedings of the 22nd international conference on World Wide Web companion
Classifying YouTube channels: a practical system

Proceedings of the 22nd international conference on World Wide Web companion
Semantator: Semantic annotator for converting biomedical text to linked data

Journal of Biomedical Informatics
Entity extraction, linking, classification, and tagging for social media: a wikipedia-based approach

Proceedings of the VLDB Endowment
Linked Open Data for Healthcare Professionals

Proceedings of International Conference on Information Integration and Web-based Applications & Services
Growing triples on trees: an XML-RDF hybrid model for annotated documents

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes Seeker, a platform for large-scale text analytics, and SemTag, an application written on the platform to perform automated semantic tagging of large corpora. We apply SemTag to a collection of approximately 264 million web pages, and generate approximately 434 million automatically disambiguated semantic tags, published to the web as a label bureau providing metadata regarding the 434 million annotations. To our knowledge, this is the largest scale semantic tagging effort to date.We describe the Seeker platform, discuss the architecture of the SemTag application, describe a new disambiguation algorithm specialized to support ontological disambiguation of large-scale data, evaluate the algorithm, and present our final results with information about acquiring and making use of the semantic tags. We argue that automated large scale semantic tagging of ambiguous content can bootstrap and accelerate the creation of the semantic web.