Scamseek: a language technology project fulfilling research objectives with industrial obligations

  • Authors:
  • Jon Patrick

  • Affiliations:
  • School of Information Technologies, University of Sydney, Sydney, New South Wales, Australia

  • Venue:
  • SEARCC '05 Proceedings of the 2005 South East Asia Regional Computer Science Confederation (SEARCC) Conference - Volume 46
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Scamseek project, as commissioned by the Australian Securities & Investment Commission (ASIC), had the principal objective of building an industrially viable system that retrieves scam candidate texts from the Internet and classifies them as to their potential risk of containing an illegal investment proposal or advice. The value of the system is the gain of significant time and efficiency savings for the human analyst. On the other hand the classificatory precision of discovering classes consisting of less than 1% of the corpus was considered unachievable by conventional word based text classification methods, hence a hitherto unexplored semantic model of language was adopted for expressing the feature space of the documents. The project thus was defined in terms of research objectives, particularly accurate detection of minute classes and representation of text classification as modelled through a strong linguistic theory. At the same time the project was obliged to produce an industrial quality system with adherence to concomitant performance criteria.