Email :

Tolegen Gulmira

Researcher of the “Institute of Information and Computational Technologies”


In 2013, Tolegen G. graduated from the Nanjing University (Nanjing, China) with a bachelor’s degree in Computer Science and Technology.

From 2013-to 2016, she studied and obtained a master’s degree in computer science and software engineering at Fudan University (Shanghai, China)

Research experience:

From 2013 to 2016, she worked as a junior researcher at the Shanghai Key Laboratory of Intelligent Information Processing, Fudan University.

From 2016 to 2017, she worked as a junior researcher at the National Laboratory Astana, Nazarbayev University.

From 2017 to 2018, she worked as a researcher at the Knowledge Engineering Group, Tsinghua University.

Since 2018, he has been working as a researcher at the Institute of Information and Computing Technologies.

She has wide research interests, mainly including artificial intelligence, machine learning, optimization, representation learning, topic modeling, clustering, knowledge engineering, data mining, speech signal processing and natural language processing.

Scientific works

  • Tolegen G., Toleu A., & Zheng, Xiaoqing. (2016). Named entity recognition for kazakh using conditional random fields. Proceedings of the 4-thInternational Conference on Computer Processingof Turkic Languages TurkLang 2016, Izvestija KGTUim.I.Razzakova, pp.118_127(

  • Toleu A., Tolegen G., & Makazhanov A. (2017). Character-aware neural morphological disambiguation. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL), Association for ComputationalLinguistics, Vancouver, Canada, pp. 666–67.DOI:10.18653/v1/P17-2105 (Scopus, Web of Science)
  • Toleu A., Tolegen G., Makazhanov A.: Character-based deep learning models for token and sentence segmentation. In: Proceedings of the 5th International Conference on Turkic Languages Processing (TurkLang 2017). Kazan, Tatarstan, Russian Federation (October 2017)

  • Toleu A., Tolegen G., Mussabayev R.: KeyVector Unsupervised Keyphrase Extraction Using Weighted Topic via Semantic Relatedness // Computación y Sistemas, 2019. -Vol. 23(3). -P. 861–869 // doi: 10.13053/CyS-23-3-3264 (Scopus Percentile = 24, Web of Science IF – 0.53)

  • Toleu A., Tolegen G., Mussabayev R. Comparison of Various Approaches for Dependency Parsing // 15th International Asian School-Seminar on Optimization Problems of Complex Systems (OPCS 2019), IEEE, 2019, Article number 8880244, pp. 192-195 (Scopus)

  • Tolegen Gulmira and Toleu Alymzhan and Orken Mamyrbayev and Rustam Mussabayev. Neural Named Entity Recognition for Kazakh.arXiv:2007.13626. (

  • Toleu A., Tolegen G., Mussabayev R. (2020) Deep Learning for Multilingual POS Tagging. In Advances in Computational Collective Intelligence. ICCCI 2020. Communications in Computer and Information Science (Scopus), vol 1287. Springer, Cham.

  • Orken Mamyrbayev, Toleu Alymzan., Tolegen Gulmira., & Nurbapa Mekebayev (2020) Neural architectures for gender detection and speaker identification, Cogent Engineering, 7:1, DOI: 10.1080/23311916.2020.1727168. (Scopus percentile = 69)

  • Tolegen G., Toleu A., Mussabayev R. Voted-Perceptron Approach for Kazakh Morphological Disambiguation // Proceedings of the 1st Joint SLTU and CCURL Workshop (SLTU-CCURL 2020), Language Resources and Evaluation Conference (LREC 2020), European Language Resources Association (ELRA), pp. 258–264 (