Information Extraction: Distilling Structured Data from Unstructured Text

Authors:
Andrew McCallum
Affiliations:
University of Massachusetts, Amherst
Venue:
Queue - Social Computing
Year:
2005

Citing 16
Cited 30

Corpus-driven knowledge acquisition for discourse analysis

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Relational learning of pattern-match rules for information extraction

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Digital Libraries and Autonomous Citation Indexing

Computer
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Maximum Entropy Markov Models for Information Extraction and Segmentation

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A Mutually Beneficial Integration of Data Mining and Information Extraction

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Bursty and hierarchical structure in streams

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Nymble: a high-performance learning name-finder

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
An integrated, conditional model of information extraction and coreference with application to citation matching

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Probabilistic reasoning for entity & relation recognition

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Named entity recognition with character-level models

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Group and topic discovery from relations and text

Proceedings of the 3rd international workshop on Link discovery
Dependency tree kernels for relation extraction

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
A high-performance semi-supervised learning method for text chunking

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Interactive information extraction with constrained conditional random fields

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Topic and role discovery in social networks

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence

Duplicate Record Detection: A Survey

IEEE Transactions on Knowledge and Data Engineering
Uncovering the to-dos hidden in your in-box

IBM Systems Journal
Book review:

Computational Linguistics
Towards a query optimizer for text-centric tasks

ACM Transactions on Database Systems (TODS)
Gui --- phooey!: the case for text input

Proceedings of the 20th annual ACM symposium on User interface software and technology
Overview and semantic issues of text mining

ACM SIGMOD Record
A generic software architecture of a text processing system for analyzing product warranty claims data

COMPUTE '08 Proceedings of the 1st Bangalore Annual Compute Conference
Information integration in the enterprise

Communications of the ACM - Enterprise information integration: and other tools for merging data
Conditional random fields for entity extraction and ontological text coding

Computational & Mathematical Organization Theory
Probabilistic Model for Structured Document Mapping

MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
Development of a National Syllabus Repository for Higher Education in Ireland

ECDL '08 Proceedings of the 12th European conference on Research and Advanced Technology for Digital Libraries
Efficient techniques for document sanitization

Proceedings of the 17th ACM conference on Information and knowledge management
Towards Machine Learning on the Semantic Web

Uncertainty Reasoning for the Semantic Web I
Finding text reuse on the web

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Information Extraction

Foundations and Trends in Databases
Towards combining web classification and web information extraction: a case study

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
A web of concepts

Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
OfCourse: web content discovery, classification and information extraction for online course materials

Proceedings of the 18th ACM conference on Information and knowledge management
Entity Resolution in Texts Using Statistical Learning and Ontologies

ASWC '09 Proceedings of the 4th Asian Conference on The Semantic Web
He says, she says, pat says, Tricia says: how much reference resolution matters for entity extraction, relation extraction, and social network analysis

CISDA'09 Proceedings of the Second IEEE international conference on Computational intelligence for security and defense applications
Building a scalable web query system

DNIS'07 Proceedings of the 5th international conference on Databases in networked information systems
Using automatic metadata extraction to build a structured syllabus repository

ICADL'07 Proceedings of the 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers
Mining automotive warranty claims data for effective root cause analysis

DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
Discovery of significant emerging trends

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
A framework for corroborating answers from multiple web sources

Information Systems
Scalable information extraction for web queries

International Journal of Computational Science and Engineering
A data mining method for accurate employment search on the web

COMATIA'10 Proceedings of the 2010 international conference on Communication and management in technological innovation and academic globalization
A semi-automated approach to building text summarisation classifiers

MLDM'12 Proceedings of the 8th international conference on Machine Learning and Data Mining in Pattern Recognition
Design and analysis of genetic algorithm based Chinese keyword extracting

International Journal of Computer Applications in Technology
When speed has a price: fast information extraction using approximate algorithms

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

In 2001 the U.S. Department of Labor was tasked with building a Web site that would help people find continuing education opportunities at community colleges, universities, and organizations across the country. The department wanted its Web site to support fielded Boolean searches over locations, dates, times, prerequisites, instructors, topic areas, and course descriptions. Ultimately it was also interested in mining its new database for patterns and educational trends. This was a major data-integration project, aiming to automatically gather detailed, structured information from tens of thousands of individual institutions every three months.