KLPP on Github

This page contains a list of artifacts developed by the Kurdish Language Processing Project (KLPP) team members. For more information about KLPP, see:


Pewan: The Kurdish Text Corpus / Test Collection

Pewan contains a large text corpus (115,000+ Sorani and 25,000+ Kurmanji news articles), 22 queries (in Sorani, Kurmanji, Persian, and English) and their corresponding relevance judgments. Two lists of stopwords (one Sorani, one Kurmanji) are also included.

Get Pewan's mirrored copy from Dropbox.

Kurdish Stemmers

We have developed two stemmers for both dialects of the Kurdish language (Sorani and Kurmanji):

The Java source code for these stemmers can be obtained from here.

Kurdish Keyboard Layouts (for Windows and Macintosh)