Text size
  • Small
  • Medium
  • Large
Contrast
  • Standard
  • Blue text on blue
  • High contrast (Yellow text on black)
  • Blue text on beige

Challenges in Urdu Stemming (A Progress Report)

BCS IRSG Symposium: Future Directions in Information Access 2007

Glasgow, 28 - 29 August 2007

AUTHORS

Kashif Riaz

ABSTRACT

This paper explains the challenges pertaining to Urdu stemming and presents a rule-based prototype with a few rules implemented for Urdu to motivate the intricacies. It shows that Urdu stemming is quite challenging because of Urdu's diverse nature and because Arabic and Farsi stemmers cannot be used for Urdu.

Dictionary-based errorcorrecting schemes used by other stemmers cannot be applied to Urdu because of the lack of machine-readable resources. There has not been any work published regarding Urdu stemming or morphological analysis in the IR community even though interest in Urdu is growing. The goal of this paper is to show the challenges in writing an Urdu stemmer, not to present a stemmer.

PAPER FORMATS

PDF filePDF Version of this Paper (61kb)