Project Manager – PhD Rakhimova Diana Ramazanovna
Senior Research Fellow, PhD А.С. Karibayeva
Senior Research Fellow, PhD М. Turdalyuly
Senior Research Fellow, Candidate of Technical Sciences Y.R.Suleimenov
Junior Research Fellow A.О.Turganbayeva
Junior Research Fellow А. Suleimenova
software engineer N. Lonovenko
software engineer D.Suleimenov
The goal of the project is to create technology (algorithms, methods, electronic resources) for a system for processing and studying the state language using modern methods and approaches of artificial intelligence, adapted to the peculiarities of the Kazakh language.
To achieve this goal, it is necessary to solve the following main tasks:
– Creation of large data sets both for user training tasks and for artificial intelligence tasks such as machine translation, speech recognition and deep learning.
– Development of an intelligent “alignment” algorithm for identifying parallel pairs of sentences from parallel texts
– Development of an automated morphological analyzer for text processing
– Development and integration of services and modules for studying the Kazakh language with machine translation and speech recognition systems.
Creation of Internet services and applications for the practical use of the obtained tools and algorithms in real life.
The following scientific and technical results were obtained:
– over 100 thousand small texts in the Kazakh language: news, materials from magazines, etc.
– over 300 books in the Kazakh language, Kazakh and foreign authors, including fiction, collections of songs, books on self-development, business, etc.
– more than 2 million Kazakh-Russian parallel sentences
– 200 thousand Kazakh-Russian dictionary entries.
For Kazakh language processing tools, approaches based on neural and deep learning were developed and the following work was implemented:
The conducted research was accompanied by software development of approaches and testing of algorithms. The results obtained were tested and evaluated using special metrics such as BLEU, TER and WER.
The practical result of the project is the development of a web application called “Oqulyq”. The results of the research work carried out within the framework of this project were tested and introduced into the educational process of the following disciplines “Language Resources”, “Machine Translation Technologies”, “Machine Learning in Natural Language Processing” of the educational master’s program 7M06101-“Computational Linguistics” Al Farabi KazNU, as well as in the educational process of the discipline “Foreign Language” (professional) of the 1st year master’s degree in the educational program 7M06101-“Software Engineering” and 7M07204-“Technology and Engineering of Food Production” of the International University of Engineering and Technology.
Based on the results of the project for 2021-2023, 26 publications were published: in foreign publications – publications indexed in the WoS and/or Scopus databases – 6 publications; in domestic publications recommended by CQASES MES RK (Committee for Quality Assurance in the Sphere of Education and Science of the Ministry of Education and Science of the Republic of Kazakhstan) – 2 publications; One monograph was published in a domestic publication and one collective monograph in a foreign publication. 3 copyright certificates for the developed computer programs were received. The results of the study were tested at international conferences and scientific seminars.