
Project: №АР05132950. Development of an information and analytical data retrieval system in the Kazakh language

Project manager and members::

The project manager is PhD Rakhimova Diana Ramazanovna.

Key members of the study group:

Doctor of technical sciences, Professor Tukeev Walsher Anuarbekovich,

junior researcher, Zhumanov Zh.M.,

junior researcher, Shormakova A.N.,

Engineer Turganbayeva A.O.,

Engineer Abduali B.,

Engineer Amirova D.

The aim of the project:

Is to develop effective algorithms and models for text data processing, based on modern technologies in the field of natural language processing using the latest advances in computer linguistics to obtain new information and knowledge from unstructured sources, large data sets and texts in the Kazakh language.

To achieve this goal, the project solved the following tasks:

Full system of classification of endings and suffixes of the Kazakh language was developed. Developed a lexicon-free algorithm using the developed system of classification of Kazakh language endings. Distinctive features of the constructed algorithm are its speed and easy enough reproducibility.

Model and system of marked corpus of Kazakh language is developed, the distinctive features are the developed modules (tokenization, lemmatization, morphological analysis) of data processing in view of Kazakh language features;

Algorithm of automatic replenishment of texts in the Kazakh language and an algorithm for indexing documents by means of attributes were developed;

The knowledge base of synonyms and phrases classified by structural formation of phrases and types of appointments for the Kazakh language, which improve the quality of information-analytical search system is developed;

Developed a module of information-analytical processing by creating an application software solution for various purposes, using artificial intelligence for the processing and analysis of both structured and unstructured big data. Algorithms and methods of this module can be further applied both individually and in complex for solving analysis of big text data:

– Algorithm for extracting key words (phrases) from documents in the Kazakh language;

– Algorithm of semantic analysis of the text, using machine learning technology (Machine Learning);

 -Method of summarization of the text in the Kazakh language;

The architecture is constructed and developed a prototype of information-analytical retrieval system, taking into account modern technologies and methods in information retrieval and semantic processing of natural language.  Sub-modules of the information retrieval module of the search system have been developed. For the purpose of technological decision the flexible architecture of information system has been developed. All program modules of the system are interconnected by integration modules (intermediate data storages) which serve as connecting links and allow to obtain a loosely-connected architecture. This design approach allows for relatively easy scalability and upgradability of the modules.


The main results of the project’s research and technical activities are presented in the following publications:

Practical results

Practical results

Kazakh ASR

As a result of this research, a speech recognition mobile application has been implemented to teach Kazakh language. This mobile application developed by IICT is made by KazVoice, which is available to the user in test mode. To work with this application it is necessary to go online When recording speech, the microphone button is pressed and speech signals are received from the microphone. The speech signals are then automatically read out, at which point the user can see the result as text.