Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
Using the web to obtain frequencies for unseen bigrams
Computational Linguistics - Special issue on web as corpus
Automatic acquisition of hyponyms from large text corpora
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Web-based models for natural language processing
ACM Transactions on Speech and Language Processing (TSLP)
Corpus-based Learning of Analogies and Semantic Relations
Machine Learning
Using the web as an implicit training set: application to structural ambiguity resolution
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Computational Linguistics
Semantic classification of noun phrases using web counts and learning algorithms
ACL '07 Proceedings of the 45th Annual Meeting of the ACL: Student Research Workshop
Measuring semantic similarity by latent relational analysis
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Using verbs to characterize noun-noun relations
AIMSA'06 Proceedings of the 12th international conference on Artificial Intelligence: methodology, Systems, and Applications
Automated construction of a large semantic network of related terms for domain-specific modeling
CAiSE'13 Proceedings of the 25th international conference on Advanced Information Systems Engineering
Hi-index | 0.00 |
This paper investigates methods for using lexical patterns in a corpus to deduce the semantic relation that holds between two nouns in a noun-noun compound phrase such as "flu virus" or "morning exercise". Much of the previous work in this area has used automated queries to commercial web search engines. In our experiments we use the Google Web 1T corpus. This corpus contains every 2, 3, 4 and 5 gram occurring more than 40 times in Google's index of the web, but has the advantage of being available to researchers directly rather than through a web interface. This paper evaluates the performance of the Web 1T corpus on the task compared to similar systems in the literature, and also investigates what kind of lexical patterns are most informative when trying to identify a semantic relation between two nouns.