Using lexical patterns in the Google Web 1T corpus to deduce semantic relations between nouns

Authors:
Paul Nulty;Fintan Costello
Affiliations:
University College Dublin, Belfield, Dublin, Ireland;University College Dublin, Belfield, Dublin, Ireland
Venue:
DEW '09 Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions
Year:
2009

Citing 10
Cited 1

Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Using the web to obtain frequencies for unseen bigrams

Computational Linguistics - Special issue on web as corpus
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Web-based models for natural language processing

ACM Transactions on Speech and Language Processing (TSLP)
Corpus-based Learning of Analogies and Semantic Relations

Machine Learning
Using the web as an implicit training set: application to structural ambiguity resolution

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Googleology is Bad Science

Computational Linguistics
Semantic classification of noun phrases using web counts and learning algorithms

ACL '07 Proceedings of the 45th Annual Meeting of the ACL: Student Research Workshop
Measuring semantic similarity by latent relational analysis

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Using verbs to characterize noun-noun relations

AIMSA'06 Proceedings of the 12th international conference on Artificial Intelligence: methodology, Systems, and Applications

Automated construction of a large semantic network of related terms for domain-specific modeling

CAiSE'13 Proceedings of the 25th international conference on Advanced Information Systems Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper investigates methods for using lexical patterns in a corpus to deduce the semantic relation that holds between two nouns in a noun-noun compound phrase such as "flu virus" or "morning exercise". Much of the previous work in this area has used automated queries to commercial web search engines. In our experiments we use the Google Web 1T corpus. This corpus contains every 2, 3, 4 and 5 gram occurring more than 40 times in Google's index of the web, but has the advantage of being available to researchers directly rather than through a web interface. This paper evaluates the performance of the Web 1T corpus on the task compared to similar systems in the literature, and also investigates what kind of lexical patterns are most informative when trying to identify a semantic relation between two nouns.