This page contains a list of artifacts developed by the Kurdish Language Processing Project (KLPP) team members.
For more information about KLPP, see:
http://eng.uok.ac.ir/esmaili/research/klpp/en/main.htm
Pewan: The Kurdish Text Corpus / Test Collection
Pewan contains a large text corpus (115,000+ Sorani and 25,000+ Kurmanji news articles), 22 queries (in Sorani, Kurmanji, Persian, and English) and their corresponding relevance judgments. Two lists of stopwords (one Sorani, one Kurmanji) are also included.
Get Pewan's mirrored copy from Dropbox.
Kurdish Stemmers
We have developed two stemmers for both dialects of the Kurdish language (Sorani and Kurmanji):
The Java source code for these stemmers can be obtained from here.
- Jedar: a new rule-based stemmer which uses a list of Kurdish suffixes
- GRAS: an implementation of a state-of-the-art statistical stemmer, proposed by J. H. Paik et al. in 2011.
Kurdish Keyboard Layouts (for Windows and Macintosh)