The main objective of this project is to develop methods, models and soft-ware tools for an end-to-end automatic speech recognition system of agglutinative (Turkic) lan-guages on the example of the Kazakh and Azerbaijani languages.
Main design and technical and economic indicators, efficiency: new technology for speech recognition, mathematical models, algorithms and methods for automatic analysis, synthesis and recognition of speech signals.
The scientific novelty of the project is the study of existing, as well as the development of new mathematical models and algorithms, software for solving the problem of developing a new tech-nology of end-to-end speech recognition for agglutinative languages. A significant difference be-tween this project and previous studies is its comprehensive and generalizing nature, aimed at creat-ing an end-to-end speech recognition technology for solving problems in the field of speech tech-nologies.
Scope of application: the case can be used to solve the problem of speaker identification, language identification and many subtasks of speech recognition, as well as in the development of artificial intelligence.
The main problem that will arise when solving these tasks is the training of artificial neural networks on large amounts of data. To reduce the training time of artificial neural networks, high-performance computing on GPUs will be used.
The practical significance of the project on a national and international scale consists in the implementation of an end-to-end automatic speech recognition system using machine learning methods, as well as in the development of new mathematical models and algorithms to solve the problem of developing a new automatic speech recognition technology for agglutinative languages on the example of Kazakh and Azerbaijani languages. This speech recognition system can be used for voice machine translation of Kazakh-Azerbaijani.
Significant social demand for high-quality speech recognition technology by voice is observed among visually impaired and blind people and is often found in mobile and household voice control applications. Positive economic interest will be due to the creation of a new market in the field of speech recognition, as well as stimulating an increase in demand in the existing market of speech technologies of low-resource languages.
The ultimate goal is to create an effective algorithm, method and software for the end-to-end recognition of agglutinative languages.
Scope of application: the case can be used to solve the problem of speaker identification, language identification and many subtasks of speech recognition, as well as in the development of artificial intelligence.
Д.О. Оралбекова, О.Ж. Мамырбаев. Современные методы распознавания речи. Новости науки Казахстана. № 1 (148). 2021, с. 20-35
Мамырбаев О.Ж., Кыдырбекова А.С., Оралбекова Д.О., Жумажанов Б.Ж., Бекарыстанкызы А. Разработка интегральной системы автоматического распознавания речи для агглютинативных языков. – Институт информационных и вычислительных технологий КН МОН РК. – 2022. – 104 с.
– acoustic corpus for agglutinative languages on the example of Kazakh and Azerbaijani languages,
– expansion of the existing speech corpus for the Kazakh language,
– collection of speech and text information for agglutinative languages, modification of the corpus up to several thousand hours,
– development of methods and models based on STS and encoder-decoder with attention mechanism.