The State Archival Service of Ukraine has transferred 10 terabytes of data for training the AI Syaivo. As reported by the Ministry of Digital Transformation, such a volume of information is equivalent to 70,000 books, UNN reports.
Imagine a volume of information equivalent to 70,000 books. That's exactly how much - 10 terabytes of unique historical materials, state documents, and scientific texts - the State Archival Service of Ukraine is transferring for the first time to train the national language model Syaivo.
As reported by the Ministry of Digital Transformation, most global AI assistants generate responses in English and translate these texts into Ukrainian, often losing context.
To make Syaivo a reliable source of information for people and businesses, we are training it on Ukrainian data. For this, the model will study historical sources, manuscripts, laws, court decisions, media materials, and dictionaries.
Let's add
The agency added that the creation of a large language model is an important step towards building AI sovereignty.
Currently, we are collecting high-quality data for the model. More than 50 partners, including media, universities, libraries, etc., are already providing their materials. As soon as the model starts working, we will publish a complete list of institutions and people who helped create the national Ukrainian AI.
Acting Minister of Digital Transformation of Ukraine Oleksandr Bornyakov noted that "for training the national language model, we are collecting data so that the language model is trained on a unique array of information."
These are state documents, scientific articles, media materials, dictionaries, historical materials, laws, court decisions, etc.
The Ministry of Digital Transformation adds that the involvement of such data accelerates the creation of a high-quality Ukrainian model that will understand our history and context.
This is a unique case where Ukrderzharkhiv is providing its data for the first time for the development of digital services in Ukraine. We have a large array of data from different historical eras, printed and handwritten, in Ukrainian and other languages. By the end of 2026, the number of digital copies of state archives will increase from 150 million to over 200 million - this is one of the highest rates of digitization of archival heritage in the world.
Ukrainians chose the name "Siaivo" for the state AI30.03.26, 17:08