A Computer Science Text Corpus/Search Engine X-Tec and Its Applications

Authors:
Takehiro Tokuda;Yusuke Soyama;Tetsuya Suzuki
Affiliations:
{tokuda, soyama, tetsuya}@tt.cs.titech.ac.jp, Dept. of Computer Science, Tokyo Institute of Technology, Meguro, Tokyo 152-8552, Japan;{tokuda, soyama, tetsuya}@tt.cs.titech.ac.jp, Dept. of Computer Science, Tokyo Institute of Technology, Meguro, Tokyo 152-8552, Japan;{tokuda, soyama, tetsuya}@tt.cs.titech.ac.jp, Dept. of Computer Science, Tokyo Institute of Technology, Meguro, Tokyo 152-8552, Japan
Venue:
Proceedings of the 2006 conference on Information Modelling and Knowledge Bases XVII
Year:
2006

Citing 7
Cited 0

Building a large-scale knowledge base for machine translation

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
WordNet: a lexical database for English

Communications of the ACM
EuroWordNet: a multilingual database with lexical semantic networks

EuroWordNet: a multilingual database with lexical semantic networks
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Using corpus statistics and WordNet relations for sense identification

Computational Linguistics - Special issue on word sense disambiguation
Using an ontology to determine English countability

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
A plethora of methods for learning English countability

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We built a computer science text corpus/search engine called X-Tec. We automatically collected 2.98 million sentences (68.9 million words) from carefully chosen English computer science documents on the Web using 678 hours. We also built an interactive sample sentence query system and an automatic expression diag-nostic system for graduate students. Our computer science text corpus/search engine can be also used for knowledge search and word co-occurrence frequency retrieval.