Applying NLP technologies to the collection and enrichment of language data on the Web to aid linguistic research

Authors:
Fei Xia;William D. Lewis
Affiliations:
University of Washington, Seattle, WA;Microsoft Research, Redmond, WA
Venue:
LaTeCH-SHELT&R '09 Proceedings of the EACL 2009 Workshop on Language Technology and Resources for Cultural Heritage, Social Sciences, Humanities, and Education
Year:
2009

Citing 7
Cited 2

An ontology for linguistics on the semantic web

An ontology for linguistics on the semantic web
An integrated, conditional model of information extraction and coreference with application to citation matching

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Markov logic networks

Machine Learning
ODIN: A Model for Adapting and Enriching Legacy Infrastructure

E-SCIENCE '06 Proceedings of the Second IEEE International Conference on e-Science and Grid Computing
Unsupervised learning of field segmentation models for information extraction

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Language ID in the context of harvesting language data off the web

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Joint inference in information extraction

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1

Language identification: the long and the short of the matter

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Modeling and encoding traditional wordlists for machine applications

NLPLING '10 Proceedings of the 2010 Workshop on NLP and Linguistics: Finding the Common Ground

Quantified Score

Hi-index	0.00

Visualization

Abstract

The field of linguistics has always been reliant on language data, since that is its principal object of study. One of the major obstacles that linguists encounter is finding data relevant to their research. In this paper, we propose a three-stage approach to help linguists find relevant data. First, language data embedded in existing linguistic scholarly discourse is collected and stored in a database. Second, the language data is automatically analyzed and enriched, and language profiles are created from the enriched data. Third, a search facility is provided to allow linguists to search the original data, the enriched data, and the language profiles in a variety of ways. This work demonstrates the benefits of using natural language processing technology to create resources and tools for linguistic research, allowing linguists to have easy access not only to language data embedded in existing linguistic papers, but also to automatically generated language profiles for hundreds of languages.