Feb

2025

AI vs AI: Scientists Develop Neural Networks to Detect Generated Text Insertions

A research team, including Alexander Shirnin from HSE University, has developed two models designed to detect AI-generated insertions in scientific texts. The AIpom system integrates two types of models: a decoder and an encoder. The Papilusion system is designed to detect modifications through synonyms and summarisation by neural networks, using one type of models: encoders. In the future, these models will assist in verifying the originality and credibility of scientific publications. Articles describing the Papilusion and AIpom systems have been published in the ACL Anthology Digital Archive.

As language models like ChatGPT and GigaChat become more popular and widely used, it becomes increasingly challenging to distinguish original human-written text from AI-generated content. Artificial intelligence is already being used to write scientific publications and graduation papers. Therefore, it is crucial to develop tools capable of identifying AI-generated insertions in texts. A research team, including scientists from HSE University, presented their solutions at the SemEval 2024 and DAGPap24 international scientific competitions.

The AIpom model was used to identify the boundaries between original and generated fragments in scientific papers. In each paper, the proportion of machine-generated text to the author's text varied. To train the models, the organisers provided texts on the same topic. However, during the verification stage, the topics changed, making the task more challenging.

Alexander Shirnin

'Models perform well on familiar topics, but their performance declines when presented with new topics,' according to Alexander Shirnin, co-author of the paper and Research Assistant at the Laboratory for Models and Methods of Computational Pragmatics, HSE Faculty of Computer Science. 'It's like a student who, having learned how to solve one type of problem, struggles to solve a problem on an unfamiliar topic or from a different subject as easily or accurately.'

To improve the system's performance, the researchers combined two models: a decoder and an encoder. At the first stage, a neural network decoder was used, with the input consisting of an instruction and the source text, and the output being a text fragment presumably generated by AI. Next, in the original text, the area where the model predicted the beginning of a generated fragment was highlighted using a special <BREAK> token. The encoder then processed the text marked up in the first stage and refined the decoder's predictions. To do this, it categorised each token—the smallest unit of text, such as a word or part of a word—and identified whether it was written by a human or generated by AI. This approach improved accuracy compared to systems that used only one type of model: AIpom ranked second at the SemEval-2024 competition.

The Papilusion model also distinguished between written text and generated text. Using Papilusion, sections of the text were classified into four categories: written by a human, modified with synonyms, generated, or summarised by a model. The task was to accurately identify each category. The number of categories and the length of insertions in the texts varied.

In this case, the developers used three models, all of the same type: encoders. They were trained to predict one of the four categories for each token in the text, with each model trained independently of the others. When a model made an error, a cost was applied, and the model was retrained with the lower layers frozen.

'Each model has a different number of layers, depending on its architecture. When training a model, we can leave the first ten or so layers unchanged and adjust only the parameters in the last two layers. This is done to prevent losing important data embedded in the first layers during training,' explains Alexander Shirnin. 'It can be compared to an athlete who makes an error in the movement of their hand. We only need to explain this part to them, rather than resetting their entire learning and retraining them, as they might forget how to move correctly overall. The same logic applies here. The method is not universal and may not work with all models, but in our case, it was effective.'

The three encoders independently determined the category for each token (word). The system's final prediction was based on the category that received the most points. Papilusion ranked sixth out of 30 in the competition.

According to the researchers, current AI detection models perform reasonably well but still have limitations. Primarily, they struggle to process data beyond what they were trained on, and overall, there is a lack of diverse data to train the models effectively.

'To obtain more data, we need to focus on collecting it. Both companies and laboratories have been doing this. Specifically for this type of task, it is necessary to collect datasets that include texts modified using multiple AI models and modification methods,' the researcher comments. 'Instead of continuing a text using just one model, more realistic scenarios should be created, such as asking the model to add to the text, rewrite the beginning for better coherence, remove parts of it, or generate a portion of the text in a new style using a different prompt. Of course, it is also important to collect data in different languages and on a variety of topics.'

Date

27 February

Topics

Research & Expertise

Keywords

artificial intelligence frontiers of science

About

Faculty of Computer Science, Laboratory for Models and Methods of Computational Pragmatics

About persons

Alexander Shirnin

HSE Psycholinguists Launch Digital Tool to Spot Dyslexia in Children

Specialists from HSE University's Centre for Language and Brain have introduced LexiMetr, a new digital tool for diagnosing dyslexia in primary school students. This is the first standardised application in Russia that enables fast and reliable assessment of children’s reading skills to identify dyslexia or the risk of developing it. The application is available on the RuStore platform and runs on Android tablets.

14 November

Nov

2025

HSE University to Join Physical AI Garage Project by Yandex

Yandex is collaborating with leading Russian universities to launch a new educational programme called Physical AI Garage. This initiative unites five universities—HSE University, ITMO, MIPT, MAI, and MEPhI—to train future professionals in physical artificial intelligence by tackling real-world industrial challenges. The programme is free, and participants will receive scholarships.

13 November

Nov

2025

Physicists Propose New Mechanism to Enhance Superconductivity with 'Quantum Glue'

A team of researchers, including scientists from HSE MIEM, has demonstrated that defects in a material can enhance, rather than hinder, superconductivity. This occurs through interaction between defective and cleaner regions, which creates a 'quantum glue'—a uniform component that binds distinct superconducting regions into a single network. Calculations confirm that this mechanism could aid in developing superconductors that operate at higher temperatures. The study has been published in Communications Physics.

12 November

Nov

2025

Neural Network Trained to Predict Crises in Russian Stock Market

Economists from HSE University have developed a neural network model that can predict the onset of a short-term stock market crisis with over 83% accuracy, one day in advance. The model performs well even on complex, imbalanced data and incorporates not only economic indicators but also investor sentiment. The paper by Tamara Teplova, Maksim Fayzulin, and Aleksei Kurkin from the Centre for Financial Research and Data Analytics at the HSE Faculty of Economic Sciences has been published in Socio-Economic Planning Sciences.

12 November

Nov

2025

Larger Groups of Students Use AI More Effectively in Learning

Researchers at the Institute of Education and the Faculty of Economic Sciences at HSE University have studied what factors determine the success of student group projects when they are completed with the help of artificial intelligence (AI). Their findings suggest that, in addition to the knowledge level of the team members, the size of the group also plays a significant role—the larger it is, the more efficient the process becomes. The study was published in Innovations in Education and Teaching International.

6 November

Nov

2025

New Models for Studying Diseases: From Petri Dishes to Organs-on-a-Chip

Biologists from HSE University, in collaboration with researchers from the Kulakov National Medical Research Centre for Obstetrics, Gynecology, and Perinatology, have used advanced microfluidic technologies to study preeclampsia—one of the most dangerous pregnancy complications, posing serious risks to the life and health of both mother and child. In a paper published in BioChip Journal, the researchers review modern cellular models—including advanced placenta-on-a-chip technologies—that offer deeper insights into the mechanisms of the disorder and support the development of effective treatments.

6 November

Nov

2025

Using Two Cryptocurrencies Enhances Volatility Forecasting

Researchers from the HSE Faculty of Economic Sciences have found that Bitcoin price volatility can be effectively predicted using Ethereum, the second-most popular cryptocurrency. Incorporating Ethereum into a predictive model reduces the forecast error to 23%, outperforming neural networks and other complex algorithms. The article has been published in Applied Econometrics.

1 November

Nov

2025

Administrative Staff Are Crucial to University Efficiency—But Only in Teaching-Oriented Institutions

An international team of researchers, including scholars from HSE University, has analysed how the number of non-academic staff affects a university’s performance. The study found that the outcome depends on the institution’s profile: in research universities, the share of administrative and support staff has no effect on efficiency, whereas in teaching-oriented universities, there is a positive correlation. The findings have been published in Applied Economics.

1 November

Oct

2025

Physicists at HSE University Reveal How Vortices Behave in Two-Dimensional Turbulence

Researchers from the Landau Institute for Theoretical Physics of the Russian Academy of Sciences and the HSE University's Faculty of Physics have discovered how external forces affect the behaviour of turbulent flows. The scientists showed that even a small external torque can stabilise the system and extend the lifetime of large vortices. These findings may improve the accuracy of models of atmospheric and oceanic circulation. The paper has been published in Physics of Fluids.

30 October

Oct

2025

Solvent Instead of Toxic Reagents: Chemists Develop Environmentally Friendly Method for Synthesising Aniline Derivatives

An international team of researchers, including chemists from HSE University and the A.N. Nesmeyanov Institute of Organoelement Compounds of the Russian Academy of Sciences (INEOS RAS), has developed a new method for synthesising aniline derivatives—compounds widely used in the production of medicines, dyes, and electronic materials. Instead of relying on toxic and expensive reagents, they proposed using tetrahydrofuran, which can be derived from renewable raw materials. The reaction was carried out in the presence of readily available cobalt salts and syngas. This approach reduces hazardous waste and simplifies the production process, making it more environmentally friendly. The study has been published in ChemSusChem.

28 October