A Computer Science Text Corpus/Search Engine X-Tec and Its Applications

  • Authors:
  • Takehiro Tokuda;Yusuke Soyama;Tetsuya Suzuki

  • Affiliations:
  • {tokuda, soyama, tetsuya}@tt.cs.titech.ac.jp, Dept. of Computer Science, Tokyo Institute of Technology, Meguro, Tokyo 152-8552, Japan;{tokuda, soyama, tetsuya}@tt.cs.titech.ac.jp, Dept. of Computer Science, Tokyo Institute of Technology, Meguro, Tokyo 152-8552, Japan;{tokuda, soyama, tetsuya}@tt.cs.titech.ac.jp, Dept. of Computer Science, Tokyo Institute of Technology, Meguro, Tokyo 152-8552, Japan

  • Venue:
  • Proceedings of the 2006 conference on Information Modelling and Knowledge Bases XVII
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We built a computer science text corpus/search engine called X-Tec. We automatically collected 2.98 million sentences (68.9 million words) from carefully chosen English computer science documents on the Web using 678 hours. We also built an interactive sample sentence query system and an automatic expression diag-nostic system for graduate students. Our computer science text corpus/search engine can be also used for knowledge search and word co-occurrence frequency retrieval.