1) Mamyrbaev O.Zh. – Head of Research, Deputy General Director, PhD, Senior Researcher (https://orcid.org/0000-0001-8318-3794);
2) Khairova N.F. – d.t.s., Professor, Chief Researcher (https://orcid.org/0000-0002-9826-0286)
3) Sharonova N.V. – d.t.s., Professor, Chief Researcher (https://orcid.org/0000-0002-7555-1507)
4) Mukhsina K.Zh. – PhD, Senior Researcher (https://orcid.org/0000-0002-8627-1949)
5) Ybytaeva G.S. – Junior Researcher (https://orcid.org/0000-0002-4243-0928)
6) Kartbaev A.Zh. – PhD, Senior Researcher (https://orcid.org/0000-0003-0592-5865)
Тo develop an information model of the automatic identification of illegal texts in Kazakh, Russian and English in Internet networks. Information model includes the “Illegal Internet Content” ontology, specialized text corpora and software tools designed to support analysts of state services in identifying texts of illegal content.
lies in a new integrated approach to the semantic analysis of the text content of the Internet, based on the simultaneous use of machine learning methods and reinforcing differentiating features obtained from the ontology of the subject area.
The project also includes the development of a method for automatically generating a linguistic ontology “Illegal Internet Content” based on a logical-linguistic model for extracting facts from unstructured documents.
Using this model allows you to automate the filling of the ontology with entities and relationships between them, extracted from the created text corpora containing criminally colored texts.
During the implementation of the project, it is supposed for the first time in the Republic of Kazakhstan to develop an ontology of the subject area of illegal Internet text content for three languages: Kazakh, Russian and English. It should be noted that in open world sources there is no available information about such ontologies that is sufficient for practical application.
Models and methods of automatic search and analysis of illegal textual information in the Kazakh, Russian and English languages based on the ontological approach.
The implementation of this project allows to increase the efficiency of semantic processing of texts in Kazakh, Russian and English; the created highly specialized ontology “Illegal Internet Content” represents a new linguistic resource of the Kazakh language, which increases the scientific potential of subsequent developments.
Law enforcement and special government organizations; social services; educational institutions and other government institutions.
The main results expected in the course of the project.
1) a basic terminological thesaurus of the illegal vocabulary of the Kazakh, Russian and English languages, representing a meta-ontology of a limited size and structure;
2) extended corpora of criminally significant texts of group online discussion communities;
3) a method of automatic ontology generation based on the available corpora and the developed approach for extracting events from the OdEE text.