Project: №AP08855743 Development of an end-to-end automatic speech recognition system for agglutinative languages (2020-2022) – Институт информационных и вычислительных технологий

The project objective:

The main objective of this project is to develop methods, models and soft-ware tools for an end-to-end automatic speech recognition system of agglutinative (Turkic) lan-guages on the example of the Kazakh and Azerbaijani languages.

Main design and technical and economic indicators, efficiency: new technology for speech recognition, mathematical models, algorithms and methods for automatic analysis, synthesis and recognition of speech signals.

The scientific novelty of the project is the study of existing, as well as the development of new mathematical models and algorithms, software for solving the problem of developing a new tech-nology of end-to-end speech recognition for agglutinative languages. A significant difference be-tween this project and previous studies is its comprehensive and generalizing nature, aimed at creat-ing an end-to-end speech recognition technology for solving problems in the field of speech tech-nologies.

Scope of application: the case can be used to solve the problem of speaker identification, language identification and many subtasks of speech recognition, as well as in the development of artificial intelligence.

Project tasks:

Development of an acoustic corpus for agglutinative (Turkic) languages on the example of the Kazakh and Azerbaijani languages. In this task, it is planned to expand the existing speech corpus, collect speech and textual information for the agglutinative language, and add data for modifying the corpus to several thousand hours;
Three types of models will be developed for the end-to-end system: speech recognition based on the Connectionist Temporal Classification; the Encoder – Decoder model of speech recog-nition using the attention mechanism and the application of techniques for stabilizing and regulariz-ing neural networks, data augmentation for training, setting parts of words as the output of the neu-ral network; implementation of the conditional Random Field model for speech recognition;
The transfer learning method will be implemented, which will help adapt models trained in Kazakh to the Azerbaijani language of the data set;
Rules will be developed for transcribing the words of the Kazakh and Azerbaijani lan-guages for an automatic transcriptor system;
An effective algorithm and software tools for the end-to-end recognition of agglutinative (Turkic) languages based on the example of the Kazakh and Azerbaijani languages will be devel-oped using the models and methods obtained during the study.

The main problem that will arise when solving these tasks is the training of artificial neural networks on large amounts of data. To reduce the training time of artificial neural networks, high-performance computing on GPUs will be used.

Practical significance:

The practical significance of the project on a national and international scale consists in the implementation of an end-to-end automatic speech recognition system using machine learning methods, as well as in the development of new mathematical models and algorithms to solve the problem of developing a new automatic speech recognition technology for agglutinative languages on the example of Kazakh and Azerbaijani languages. This speech recognition system can be used for voice machine translation of Kazakh-Azerbaijani.

Significant social demand for high-quality speech recognition technology by voice is observed among visually impaired and blind people and is often found in mobile and household voice control applications. Positive economic interest will be due to the creation of a new market in the field of speech recognition, as well as stimulating an increase in demand in the existing market of speech technologies of low-resource languages.

The ultimate goal is to create an effective algorithm, method and software for the end-to-end recognition of agglutinative languages.

Publications:

Д.О. Оралбекова, О.Ж. Мамырбаев. Современные методы распознавания речи. Новости науки Казахстана. № 1 (148). 2021, с. 20-35

Ө.Ж. Мамырбаев, А.С. Кыдырбекова, Б.Ж. Жумажанов, Д.О. Оралбекова. Распознавание голоса с использованием x-векторов. Вестник Алматинского университета энергетики и связи № 1 (52) 2021. с. 69-77
Авторское свидетельство “Система автоматического распознавания казахской речи на основе интегральной архитектуры” № 15501 от 25.02.2021. Авторы: О.Ж. Мамырбаев, Д.О. Оралбекова, А.С. Кыдырбекова, Б.Ж. Жумажанов, Т.Тұрдалықызы.
Мамырбаев О.Ж., Оралбекова Д.О., Кыдырбекова А.С., Жумажанов Б.Ж. , Тұрдалықызы Т. Интегральная гибридная модель на основе СТС и механизма внимания для распознавания казахской слитной речи. Международная научно-практическая конференция “Сатпаевские чтения – 2021” Труды сатпаевских чтений “Сатпаевские чтения – 2021”, Том 2, Алматы, стр. 48-52
O. Mamyrbayev, D. Oralbekova, A. Kydyrbekova, T. Turdalykyzy and A. Bekarystankyzy, “End-to-End Model Based on RNN-T for Kazakh Speech Recognition,” 2021 3rd International Conference on Computer Communication and the Internet (ICCCI), 2021, pp. 163-167, doi: 10.1109/ICCCI51764.2021.9486811.
Mamyrbayev, O., Kydyrbekova, A., Alimhan, K., Oralbekova, D., Zhumazhanov, B., Nuranbayeva, B. (2021). Development of security systems using DNN and i & x-vector classifiers. Eastern-European Journal of Enterprise Technologies, 4 (9 (112)), 32–45. doi: https://doi.org/10.15587/1729-4061.2021.239186
Мамырбаев О.Ж., Оралбекова Д.О., Othman M., Тулендиев Д.М., Жумажанов Б., Турдалыкызы Т. Распознавание казахской речи на основе интегральной модели RNN-T. VІ Международная научно-практическая конференция “Информатика и прикладная математика. 29 сентября – 1 октября 2021 г., Алматы, Казахстан. C.322-327
Мамырбаев О.Ж., Оралбекова Д.О., Othman M., Тулендиев Д.М., Жумажанов Б., Турдалыкызы Т. Исследование интегральной модели на основе внимания для автоматического распознавания казахской речи. Материалы Международной научной конференции в области информационных технологий, посвященной 75-летию профессора У.А. Тукеева. 8 октября 2021 г., Алматы, Казахстан. C.86-89
Mahambetova, U., Estemesov, Z., Nuranbayeva, B., Sadykov, P., Mamyrbayev, O., & Oralbekova, D. (2021). Development and research of the influence of the composition and concentration of activators on the strength of phosphorus slag binders. Eastern-European Journal of Enterprise Technologies, 5(6 (113), 54–61. https://doi.org/10.15587/1729-4061.2021.242814
Mamyrbayev O., Oralbekova D., Alimhan K., Othman M., Zhumazhanov B. Realization of online systems for automatic speech recognition// News of the National academy of sciences of the republic of Kazakhstan. – 2021. – Vol. 6, № 340. – P. 66 – 72 // doi.org/10.32014/2020.2518-1726.64
Mamyrbayev, O., Alimhan, K., Oralbekova, D., Bekarystankyzy, A., & Zhumazhanov, B. (2022). Identifying the influence of transfer learning method in developing an end-to-end automatic speech recognition system with a low data level. Eastern-European Journal of Enterprise Technologies, 1(9(115), 84–92. https://doi.org/10.15587/1729-4061.2022.252801
О.Ж. Мамырбаев, Д.О. Оралбекова, K. Алимхан, M. Othman, Б. Жумажанов. Применение гибридной интегральной модели для распознавания казахской речи// News of the National academy of sciences of the republic of Kazakhstan. – 2022. – Vol. 1, № 341. – P. 58 – 68 // doi.org/10.32014/2022.2518-1726.117

Copyright certificates:

“Система автоматического распознавания казахской речи на основе интегральной архитектуры” № 15501 от 25.02.2021. Авторы: О.Ж. Мамырбаев, Д.О. Оралбекова, А.С. Кыдырбекова, Б.Ж. Жумажанов, Т.Тұрдалықызы.
“Система идентификации и аутентификации через речевые технологии” № 23323 от 04.02.2022. Авторы: Оралбекова Д.О., Мамырбаев О.Ж., Алимхан К., Кыдырбекова А.С., Жумажанов Б.Ж., Турдалыкызы Т.
“Система автоматического распознавания казахской слитной речи на основе модели с механизмом внимания” №24178 от 5.03.2022. Авторы: Мамырбаев О.Ж., Оралбекова Д.О., Әлімхан Қ., Кыдырбекова А.С., Жұмажанов Б.Ж., Тұрдалықызы Т.

Monographs:

Мамырбаев О.Ж., Кыдырбекова А.С., Оралбекова Д.О., Жумажанов Б.Ж., Бекарыстанкызы А. Разработка интегральной системы автоматического распознавания речи для агглютинативных языков. – Институт информационных и вычислительных технологий КН МОН РК. – 2022. – 104 с.

The results obtained:

– acoustic corpus for agglutinative languages on the example of Kazakh and Azerbaijani languages,

– expansion of the existing speech corpus for the Kazakh language,

– collection of speech and text information for agglutinative languages, modification of the corpus up to several thousand hours,

– development of methods and models based on STS and encoder-decoder with attention mechanism.