Database optimization for novelty detection

  • Authors:
  • Ong Chun Lin;Agus T. Kwee;Flora S. Tsai

  • Affiliations:
  • School of Electrical & Electronic Engineering, Nanyang Technological University, Singapore;School of Electrical & Electronic Engineering, Nanyang Technological University, Singapore;School of Electrical & Electronic Engineering, Nanyang Technological University, Singapore

  • Venue:
  • ICICS'09 Proceedings of the 7th international conference on Information, communications and signal processing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Research in the area of optimizing databases in any Database Management System (DBMS) has been evolving constantly. Today, programming languages are being integrated into database systems to help professional programmers develop software quickly to meet deadlines. Therefore, the design of a database must cater to both the needs of customers and the efficiency of database processes. In this paper, a database application, novelty detection, is used to detect new documents for readers who do not want redundant documents to be read again. This application needs a database to store history and current documents. The objective of this research is to optimize the database tables for up to 10 million records. The experiments are done on both sentence level and document level. In both levels, the investigation of data optimization and the use of proper indexing are conducted. In MYSQL, the MYSQL B-Tree index is used to speed up data selection. In addition, the use of EXPLAIN enables us to properly index the correct data column and to avoid redundant indexing. Optimizing data types are also investigated to ensure no extra work is done by MYSQL in selecting data. A technique known as batching is also introduced to speed up results insertion after novelty detection has been done. Overall, the combined optimization improved the speed by up to 90%. Therefore, we have successfully optimized the database for novelty detection, and the techniques have been integrated into a real-time novelty detection application.