Web-Based Lemmatisation of Named Entities

  • Authors:
  • Richárd Farkas;Veronika Vincze;István Nagy;Róbert Ormándi;György Szarvas;Attila Almási

  • Affiliations:
  • MTA-SZTE,Research Group on Artificial Intelligence, , Szeged, Hungary 6720;Department of Informatics, University of Szeged, Szeged, Hungary 6720;Department of Informatics, University of Szeged, Szeged, Hungary 6720;MTA-SZTE,Research Group on Artificial Intelligence, , Szeged, Hungary 6720;MTA-SZTE,Research Group on Artificial Intelligence, , Szeged, Hungary 6720;Department of Informatics, University of Szeged, Szeged, Hungary 6720

  • Venue:
  • TSD '08 Proceedings of the 11th international conference on Text, Speech and Dialogue
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Identifying the lemma of a Named Entity is important for many Natural Language Processing applications like Information Retrieval. Here we introduce a novel approach for Named Entity lemmatisation which utilises the occurrence frequencies of each possible lemma. We constructed four corpora in English and Hungarian and trained machine learning methods using them to obtain simple decision rules based on the web frequencies of the lemmas. In experiments our web-based heuristic achieved an average accuracy of nearly 91%.