• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

HSE University Develops Tool for Assessing Text Complexity in Low-Resource Languages

An installation at the National Library of the Republic of Tatarstan celebrating the history of Tatar writing, featuring symbols from various alphabets

An installation at the National Library of the Republic of Tatarstan celebrating the history of Tatar writing, featuring symbols from various alphabets
© Wikimedia Commons

Researchers at the HSE Centre for Language and Brain have developed a tool for assessing text complexity in low-resource languages. The first version supports several of Russia’s minority languages, including Adyghe, Bashkir, Buryat, Tatar, Ossetian, and Udmurt. This is the first tool of its kind designed specifically for these languages, taking into account their unique morphological and lexical features.

According to the Institute of Linguistics of the Russian Academy of Sciences, 155 languages are spoken in Russia. Some of them are used by relatively small communities—for example, around 80,000 people speak Adyghe, while 250,000 to 350,000 people speak Buryat, Ossetian, and Udmurt. Other languages, such as Bashkir and Tatar, have more than one million native speakers. All of these languages hold official status in various republics of Russia, making it essential not only to preserve them but also to create conditions for their development, including opportunities for learning and use in education and science. 

In 2025, a Presidential Decree approving the Fundamentals of the State Language Policy of the Russian Federation was adopted. It affirms linguistic diversity and outlines a strategy for the development and practical use of the languages spoken by the peoples of Russia. One way to advance these goals is to create digital tools that make working with low-resource languages easier and more accessible.

A team of scientists at the HSE Centre for Language and Brain has developed an online text complexity calculator for quick and easy assessment of text difficulty in several minority languages, taking into account their linguistic features. The calculator is based on Textometr, a tool created by Antonina Laposhina and Maria Lebedeva for evaluating the complexity of Russian-language texts.

The calculator developed by psycholinguists at HSE University evaluates texts across several parameters: word length and frequency based on data from language corpora; the percentage of vocabulary covered by the frequency list (ie the share of words in the text that appear among the 5,000 most frequent words in the respective language); and the distribution of parts of speech within the text. In addition, the calculator considers factors such as lexical density and diversity, as well as the text's narrativity and descriptiveness.

The key innovation is the use of the Flesch Reading Ease formula, adapted separately for each language, making it possible to assess text complexity and readability more accurately. 

The Flesch score is based on the number of words, sentences, and syllables, but the original coefficients were developed for English and do not work well for structurally different languages—such as the polysynthetic Adyghe language, in which the average word is much longer. In a 2025 study, Uliana Petrunina and Nina Zdorova recalculated the formula’s coefficients specifically for Adyghe, which significantly improved the accuracy of the readability assessment.

Uliana Petrunina

'The parameters of our calculator are adapted to the structural features of each of the six low-resource languages of Russia, using text corpora as well as frequency and morphological analyses. We also adapted the classic Flesch Reading Ease score. As a result, the algorithm can be easily reconfigured for other low-resource languages, regardless of their typological characteristics,' explains Uliana Petrunina, Research Fellow at the HSE Centre for Language and Brain and one of the developers of the tool.

The tool will help create comparable stimulus materials for linguistic experiments and provide teachers with a resource for selecting high-quality educational materials by difficulty level. This solution represents an important contribution to the preservation and development of Russia’s minority languages and to supporting the country’s linguistic diversity. 

Nina Zdorova

'Our tool allows researchers and teachers to select materials based on their linguistic complexity, which is particularly important for research and education in languages with limited resources,' says Nina Zdorova, one of the creators of the tool.

Future versions are expected to include additional low-resource languages that are underrepresented in linguistics, both in Russia and beyond.

See also:

HSE Scientists Uncover How Authoritativeness Shapes Trust

Researchers at the HSE Institute for Cognitive Neuroscience have studied how the brain responds to audio deepfakes—realistic fake speech recordings created using AI. The study shows that people tend to trust the current opinion of an authoritative speaker even when new statements contradict the speaker’s previous position. This effect also occurs when the statement conflicts with the listener’s internal attitudes. The research has been published in the journal NeuroImage.

Language Mapping in the Operating Room: HSE Neurolinguists Assist Surgeons in Complex Brain Surgery

Researchers from the HSE Center for Language and Brain took part in brain surgery on a patient who had been seriously wounded in the SMO. A shell fragment approximately five centimetres long entered through the eye socket, penetrated the cranial cavity, and became lodged in the brain, piercing the temporal lobe responsible for language. Surgeons at the Burdenko Main Military Clinical Hospital removed the foreign object while the patient remained conscious. During the operation, neurolinguists conducted language tests to ensure that language function was preserved.

AI Overestimates How Smart People Are, According to HSE Economists

Scientists at HSE University have found that current AI models, including ChatGPT and Claude, tend to overestimate the rationality of their human opponents—whether first-year undergraduate students or experienced scientists—in strategic thinking games, such as the Keynesian beauty contest. While these models attempt to predict human behaviour, they often end up playing 'too smart' and losing because they assume a higher level of logic in people than is actually present. The study has been published in the Journal of Economic Behavior & Organization.

HSE University and InfoWatch Group Sign Cooperation Agreement

HSE University and the InfoWatch Group of Companies marked the start of a new stage in their collaboration with the signing of a new agreement. The partnership aims to develop educational programmes and strengthen the practical training of specialists for the digital economy. The parties will cooperate in developing and reviewing curricula, and experts from InfoWatch will be involved in teaching and mentoring IT and information security specialists at HSE University.

Scientists Discover One of the Longest-Lasting Cases of COVID-19

An international team, including researchers from HSE University, examined an unusual SARS-CoV-2 sample obtained from an HIV-positive patient. Genetic analysis revealed multiple mutations and showed that the virus had been evolving inside the patient’s body for two years. This finding supports the theory that the virus can persist in individuals for years, gradually accumulate mutations, and eventually spill back into the population. The study's findings have been published in Frontiers in Cellular and Infection Microbiology.

HSE Scientists Use MEG for Precise Language Mapping in the Brain

Scientists at the HSE Centre for Language and Brain have demonstrated a more accurate way to identify the boundaries of language regions in the brain. They used magnetoencephalography (MEG) together with a sentence-completion task, which activates language areas and reveals their functioning in real time. This approach can help clinicians plan surgeries more effectively and improve diagnostic accuracy in cases where fMRI is not the optimal method. The study has been published in the European Journal of Neuroscience.

For the First Time, Linguists Describe the History of Russian Sign Language Interpreter Training

A team of researchers from Russia and the United Kingdom has, for the first time, provided a detailed account of the emergence and evolution of the Russian Sign Language (RSL) interpreter training system. This large-scale study spans from the 19th century to the present day, revealing both the achievements and challenges faced by the professional community. Results have been published in The Routledge Handbook of Sign Language Translation and Interpreting.

HSE Scientists Develop DeepGQ: AI-based 'Google Maps' for G-Quadruplexes

Researchers at the HSE AI Research Centre have developed an AI model that opens up new possibilities for the diagnosis and treatment of serious diseases, including brain cancer and neurodegenerative disorders. Using artificial intelligence, the team studied G-quadruplexes—structures that play a crucial role in cellular function and in the development of organs and tissues. The findings have been published in Scientific Reports.

New Catalyst Maintains Effectiveness for 12 Hours

An international team including researchers from HSE MIEM has developed a catalyst that enables fast and low-cost hydrogen production from water. To achieve this, the scientists synthesised nanoparticles of a complex oxide containing six metals and anchored them onto various substrates. The catalyst supported on reduced graphene layers proved to be nearly three times more efficient than the same oxide without a substrate. This development could significantly reduce the cost of hydrogen production and accelerate the transition to green energy. The study has been published in ACS Applied Energy Materials. The work was carried out under a grant from the Russian Science Foundation.

HSE Strategic Technological Projects in 2025

In 2025, HSE University continued its participation in the Priority 2030 Strategic Academic Leadership Programme, maintaining a strong focus on technological leadership in line with the programme’s updated framework. A key element of the university’s technological leadership strategy is its Strategic Technological Projects (STPs), aimed at creating in-demand, knowledge-intensive products and services.