Farasa is a | module

Farasa is the state-of-the-art full-stack package to deal with Arabic Language Processing. It has been developed by Arabic Language Technologies Group at Qatar Computing Research Institute (QCRI) It has a RESTful Web API that you can use through your favorable programming language.

Try Now Use Web API Download now




Universities & Institutions




API calls

Why Farasa?!

Super Ultra Fast

  • When evaluating 7.4 million words on i5 laptop
    # Package Time
    1 Farasa 129 SECs
    2 Qatara 18 MINs
    3 MADAMIRA ~2.5 hours

Easy to Install & Use

We provide many ways to use Farasa. To name a few, You can:

  • Use our Web API services
  • Download it and use the JAR files we provide
  • Download the source code and build your own package
  • you can always use the online demo page to process a limited-on-size portion of text.

Outperforms other Tools and Packages

When you purchase this template, you'll freely receive future updates.

Artificial Intelligence & Machine Learning

Clear Documentation + How-To section You can easily read, edit, and write your own code, or change everything.

Well Documented

Clear Documentation + How-To section You can easily read, edit, and write your own code, or change everything.

Support All Platforms

We support Windows 10, Mac OS, every Linux distro which let us cover over 98% of OS market.

Open source

Farasa source code is available to download. You can change and build your own modules as long as it is for Academic and/or research purposes.

Free for Academic and Research purposes

QCRI FARASA package for processing Arabic text is being made public for research purpose only.

For non-research use, please contact us.

Try the RESTful Web API

You can try all Farasa modules through Web API in the blow textarea. However, we force some limitations on portions of text you try and on the number of trials.

RESTful Web API?!!     It's as easy as 1-2-3

Register First

We provide a Web API for every Module of Farasa. We provide samples of Python, Java, Javascript and Curl code to show how to use our API endpoints.

We save you the hassle of downloading and installing the software on your machine and taking in consideration how thin the client machine might be.

The Web service API is totally free of charge. You just need to register for security reasons and start using the API with your favorable programming language.

Join by filling the form in the registration page in this Farasa Website.

Confirm & Login

Go to your registered e-mail and you will find your API key. You can now Login and go to your profile to view your API usage.

It's as easy as Copy & Paste to get start

Once you logged in, go the desired Farasa module and Find Web API section in the Page. Select the language tab you want and copy the source code snippet into your IDE or whatever.

And horey you are DONE! :)

We provide snippets for Python, Java, Javascript and Curl.

Download Farasa


Register First

As with RESTful Web API, you need to register your information before allowed to download the targeted module. If you already registered, just login.



Just fill in the login information and signin. Login


Press Download Button

After login, go to module page, click download link, and the download will start automatically.

Our Team

Dr. Kareem Darwish
Principal Scientist
Hamdy Mubarak
Principal Software Engineer
Dr. Ahmed Abdelali
Principal Software Engineer
Mohamed Eldesouki
Research Associate
Dr. Younes Samih
PostDoc Researcher
Sabit Hassan
Research Assistant
  • Kareem Darwish, Ahmed Abdelali, Hamdy Mubarak, and Mohamed Eldesouki, “Arabic diacritic recovery using a feature-rich bilstm model,” Arxiv preprint arxiv:2002.01207, 2020.
  • Hamdy Mubarak, Ahmed Abdelali, Kareem Darwish, Mohamed Eldesouki, Younes Samih, and Hassan Sajjad, “A system for diacritizing four varieties of Arabic,” in In proceedings of the empirical methods in natural language processing (emnlp), 2019.
  • Hamdy Mubarak, Ahmed Abdelali, Hassan Sajjad, Younes Samih, and Kareem Darwish, “Highly Effective Arabic Diacritization using Sequence to Sequence Modeling,” in Proceedings of the annual conference of the north american chapter of the association for computational linguistics: human language technologies (naacl), 2019.
  • Ahmed Abdelali, Mohammed Attia, Younes Samih, Kareem Darwish, and Hamdy Mubarak, “Diacritization of maghrebi arabic sub-dialects,” Arxiv preprint arxiv:1810.06619, 2018.
  • Kareem Darwish, Hamdy Mubarak, Ahmed Abdelali, Mohamed Eldesouki, Younes Samih, Randah Alharbi, Mohammed Attia, Walid Magdy, and Laura Kallmeyer, “Multi-dialect arabic pos tagging: a crf approach,” in In 11th edition of the language resources and evaluation conference, 2018.
  • Kareem Darwish, Ahmed Abdelali, Hamdy Mubarak, Younes Samih, and Mohammed Attia, “Diacritization of moroccan and tunisian arabic dialects: a crf approach,” in Proceedings of the 4th arabic natural language processing workshop (wanlp-2018), the 11th edition of the language resources and evaluation conference, 2018.
  • Hamdy Mubarak, "Build Fast and Accurate Lemmatization for Arabic," In Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018.
  • Kareem Darwish, Hamdy Mubarak, Ahmed Abdelali, and Mohamed Eldesouki, "Arabic pos tagging: Don’t abandon feature engineering just yet," In Proceedings of the Third Arabic Natural Language Processing Workshop, 2017, (pp. 130-137).
  • Mohamed Eldesouki, Younes Samih, Ahmed Abdelali, Mohammed Attia, Hamdy Mubarak, Kareem Darwish, and aura Kallmeyer, “Arabic multi-dialect segmentation: bi-lstm-crf vs. svm,” Arxiv preprint arxiv:1708.05891, 2017.
  • Ahmed Abdelali, Kareem Darwish, Nadir Durrani, Hamdy Mubarak. "Farasa: A Fast and Furious Segmenter for Arabic," NAACL-2016
  • Kareem Darwish and Hamdy Mubarak. 2016. "Farasa: A New Fast and Accurate Arabic Word Segmenter," LREC-2016.
  • Zhang, Yuan, Chengtao Li, Regina Barzilay, and Kareem Darwish. “Randomized Greedy Inference for Joint Segmentation, POS Tagging and Dependency Parsing.” In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 42-52. 2015.
  • Hamdy Mubarak, Kareem Darwish, Ahmed Abdelali. 2015. “QCRI@QALB-2015 Shared Task:Correction of Arabic Text for Native and Non-Native Speakers’ Errors”. Proceedings of the ACL 2015 Second Workshop on Arabic Natural Language Processing.
  • Kareem Darwish, Wei Gao. 2014. "Simple Effective Microblog Named Entity Recognition: Arabic as an Example," LREC-2014.
  • Hamdy Mubarak and Kareem Darwish, “Automatic correction of arabic text: a cascaded approach,” in Proceedings of the emnlp 2014 workshop on arabic natural language processing (anlp), 2014, p. 132–136.
  • Kareem Darwish. 2013. "Named Entity Recognition using Cross-lingual Resources: Arabic as an Example," ACL-2013.