BCS is a registered charity: No 292786
BCS IRSG Symposium: Future Directions in Information Access 2007
Glasgow, 28 - 29 August 2007
This paper explains the challenges pertaining to Urdu stemming and presents a rule-based prototype with a few rules implemented for Urdu to motivate the intricacies. It shows that Urdu stemming is quite challenging because of Urdu's diverse nature and because Arabic and Farsi stemmers cannot be used for Urdu.
Dictionary-based errorcorrecting schemes used by other stemmers cannot be applied to Urdu because of the lack of machine-readable resources. There has not been any work published regarding Urdu stemming or morphological analysis in the IR community even though interest in Urdu is growing. The goal of this paper is to show the challenges in writing an Urdu stemmer, not to present a stemmer.