Extracting Enterprise Vocabularies Using Linked Open Data

Authors:
Julian Dolby;Achille Fokoue;Aditya Kalyanpur;Edith Schonberg;Kavitha Srinivas
Affiliations:
IBM Watson Research Center, Yorktown Heights, USA 10598;IBM Watson Research Center, Yorktown Heights, USA 10598;IBM Watson Research Center, Yorktown Heights, USA 10598;IBM Watson Research Center, Yorktown Heights, USA 10598;IBM Watson Research Center, Yorktown Heights, USA 10598
Venue:
ISWC '09 Proceedings of the 8th International Semantic Web Conference
Year:
2009

Citing 10
Cited 4

Snowball: extracting relations from large plain-text collections

DL '00 Proceedings of the fifth ACM conference on Digital libraries
Learning by googling

ACM SIGKDD Explorations Newsletter
Automatic glossary extraction: beyond terminology identification

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Named entity recognition with a maximum entropy approach

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Automatically refining the wikipedia infobox ontology

Proceedings of the 17th international conference on World Wide Web
Information extraction from Wikipedia: moving down the long tail

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
YAGO: A Large Ontology from Wikipedia and WordNet

Web Semantics: Science, Services and Agents on the World Wide Web
Deriving a large scale taxonomy from Wikipedia

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
PORE: positive-only relation extraction from wikipedia text

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
Towards knowledge acquisition from information extraction

ISWC'06 Proceedings of the 5th international conference on The Semantic Web

Large scale relation detection

FAM-LbR '10 Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading
Automatic metadata extraction from multilingual enterprise content

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
An academic search and analysis prototype for specific domain

APWeb'12 Proceedings of the 14th international conference on Web Technologies and Applications
Harnessing linked knowledge sources for topic classification in social media

Proceedings of the 24th ACM Conference on Hypertext and Social Media

Quantified Score

Hi-index	0.00

Visualization

Abstract

A common vocabulary is vital to smooth business operation, yet codifying and maintaining an enterprise vocabulary is an arduous, manual task. We describe a process to automatically extract a domain specific vocabulary (terms and types) from unstructured data in the enterprise guided by term definitions in Linked Open Data (LOD). We validate our techniques by applying them to the IT (Information Technology) domain, taking 58 Gartner analyst reports and using two specific LOD sources --- DBpedia and Freebase. We show initial findings that address the generalizability of these techniques for vocabulary extraction in new domains, such as the energy industry.