Equivalent to 70,000 books: Ukrderzhavarkhiv transferred 10 terabytes of data to train Ukraine's AI Syaivo

 • 2458 переглядiв

A data array equivalent to 70,000 books will be used to train the national language model. This will help the AI better understand the Ukrainian context.

The State Archival Service of Ukraine has transferred 10 terabytes of data for training the AI Syaivo. As reported by the Ministry of Digital Transformation, such a volume of information is equivalent to 70,000 books, UNN reports.

Imagine a volume of information equivalent to 70,000 books. That's exactly how much - 10 terabytes of unique historical materials, state documents, and scientific texts - the State Archival Service of Ukraine is transferring for the first time to train the national language model Syaivo.

- the statement says.

As reported by the Ministry of Digital Transformation, most global AI assistants generate responses in English and translate these texts into Ukrainian, often losing context.

To make Syaivo a reliable source of information for people and businesses, we are training it on Ukrainian data. For this, the model will study historical sources, manuscripts, laws, court decisions, media materials, and dictionaries.

- the statement says.

Let's add

The agency added that the creation of a large language model is an important step towards building AI sovereignty.

Currently, we are collecting high-quality data for the model. More than 50 partners, including media, universities, libraries, etc., are already providing their materials. As soon as the model starts working, we will publish a complete list of institutions and people who helped create the national Ukrainian AI.

- the statement says.

Acting Minister of Digital Transformation of Ukraine Oleksandr Bornyakov noted that "for training the national language model, we are collecting data so that the language model is trained on a unique array of information."

These are state documents, scientific articles, media materials, dictionaries, historical materials, laws, court decisions, etc.

- Bornyakov added.

The Ministry of Digital Transformation adds that the involvement of such data accelerates the creation of a high-quality Ukrainian model that will understand our history and context.

This is a unique case where Ukrderzharkhiv is providing its data for the first time for the development of digital services in Ukraine. We have a large array of data from different historical eras, printed and handwritten, in Ukrainian and other languages. By the end of 2026, the number of digital copies of state archives will increase from 150 million to over 200 million - this is one of the highest rates of digitization of archival heritage in the world.

- added Anatoliy Khromov, head of Ukrderzharkhiv.

Ukrainians chose the name "Siaivo" for the state AI30.03.26, 17:08

Popular
OpenAI CEO breaks silence after 'Molotov cocktail' attack on his estate

 • 3080 переглядiв

China prepares arms shipment to Iran - CNN

 • 3712 переглядiв

Boris Johnson visited Ukrainian military personnel in the Zaporizhzhia direction

 • 5752 переглядiв

US to unfreeze $6 billion in Iranian assets in Qatar - Media

 • 4826 переглядiв

News by theme