Using N-grams to identify mathematical topics in MXit lingo

  • Authors:
  • Laurie L. Butgereit;Reinhardt A. Botha

  • Affiliations:
  • Nelson Mandela Metropolitan University, Summerstrand, Port Elizabeth, RSA;Nelson Mandela Metropolitan University, Summerstrand, Port Elizabeth, RSA

  • Venue:
  • Proceedings of the South African Institute of Computer Scientists and Information Technologists Conference on Knowledge, Innovation and Leadership in a Diverse, Multidisciplinary Environment
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

N-grams are used to quantify the similarity between two documents or the similarity between two collections of words. This paper shows how N-grams of length 3 and N-grams of length 4 both coupled with text preprocessing (including stop word removal and stemming according to MXit spelling conventions) can be used to categorize very short mathematical conversations conducted in MXit lingo into broad mathematical groups such as algebra, geometry, trigonometry, and calculus. MXit lingo is an abbreviated form of written English which children, teenagers and young adults utilise when communicating using the popular MXit chat mechanism over cell phones. Conversations from the "Dr Math" project were used for this analysis. "Dr Math" is a mathematics tutoring service which links primary and secondary school pupils to tutors from local universities. The tutors assist the pupils with their mathematics homework.