Yearly Report 2025

Members

Faculty Members

  • Damianos Chatziantoniou
  • Maria Kechagia
  • Dimitris Mitropoulos
  • Panos (Panagiotis) Louridas
  • Diomidis Spinellis

Senior Researchers

  • Nikolaos Alexopoulos
  • Vaggelis Atlidakis
  • Makrina Viola Kosti
  • Zoe Kotti
  • Vasiliki Efstathiou
  • Stefanos Georgiou
  • Thodoris Sotiropoulos
  • Marios Fragkoulis

Associate Researchers

  • Andreas Lampropoulos
  • Muhammad Irfan Khalid
  • Konstantina Panagopoulou
  • Charalambos-Ioannis Mitropoulos
  • Stefanos Chaliasos
  • Tushar Sharma
  • Konstantina Dritsa

Researchers

  • Maria-Eliza Patska
  • Tasos Gerasis
  • Cristian Scobioala
  • Giorgos Karandreas
  • Lazaros Giannoulakos
  • Ioanna Vougiatzi
  • Dimitrios Papakyriakopoulos
  • Konstantinos Karakatsanis
  • Dimitrios Chatzipavlis
  • Gregory Alexandrou
  • Marek Horvath
  • Maria Gaitani
  • Nikolaos Bellos
  • Andriana Bilali
  • George Flourakis
  • Theofanis Orfanoudakis
  • Simos Athanasiadis
  • Evgenia Pampidi
  • Foivos - Timotheos Proestakis
  • Efthymios Kontoes
  • Panagiotis Daskalopoulos
  • Georgios Liargkovas
  • Ioanna Moraiti
  • Evangelos Talos
  • Giannis Karyotakis
  • Ilias Mpourdakos
  • Apostolos Garos
  • Chris Lazaris
  • Rafaila Galanopoulou
  • Evangelia Panourgia
  • Christina Zacharoula Chaniotaki
  • Georgios - Petros Drosos
  • George Theodorou
  • Christos Pappas
  • Angeliki Papadopoulou
  • George Metaxopoulos
  • Theodosis Tsaklanos
  • Michael Loukeris
  • Marios Papachristou
  • Christos Chatzilenas
  • Ioannis Batas
  • Efstathia Chioteli
  • Vitalis Salis

Overview in numbers

New Publications Number
Monographs and Edited Volumes 0
PhD Theses 0
Journal Articles 8
Book Chapters 0
Conference Publications 3
Technical Reports 0
White Papers 0
Magazine Articles 0
Working Papers 0
Datasets 0
Total New Publications 11
Projects
New Projects 0
Ongoing Projects 2
Completed Projects 0
Members
Faculty Members 5
Senior Researchers 8
Associate Researchers 7
Researchers 43
Total Members 63
New Members 11
PhDs
Ongoing PhDs 2
Completed PhDs 1
New Seminars
New Seminars 12

New Publications

Journal Articles

    • Diomidis Spinellis. Rewriting the Unix stream editor in Rust. IEEE Software, 42(5):21–25, September 2025.
    • Diomidis Spinellis. Modernizing a security alarm system. IEEE Software, 42(3):18–21, May 2025.
    • Diomidis Spinellis. False authorship: an explorative case study around an AI-generated article published under my name. Research Integrity and Peer Review, May 2025.
    • Diomidis Spinellis. Efficient graph processing. IEEE Software, 42(1):22–25, January 2025.
    • Diomidis Spinellis. Designing a programmable stream editor. IEEE Software, 42(6):23–27, November 2025.
    • Diomidis Spinellis. Analyzing Linux on a supercomputer. IEEE Software, 42(2):18–23, March 2025.
    • Ioannis Karyotakis, Evangelos Talos, and Diomidis Spinellis. TGIF: the evolution of developer commit times. Empirical Software Engineering, December 2025.
    • Artun Boz, Wouter Zorgdrager, Zoe Kotti, Jesse Harte, Panos Louridas, Vassilios Karakoidas, Dietmar Jannach, and Marios Fragkoulis. Improving sequential recommendations with LLMs. ACM Transactions on Recommender Systems, January 2025.

Conference Publications

    • Konstantinos Karakatsanis, Georgios Alexopoulos, Ioannis Karyotakis, Foivos Timotheos Proestakis, Evangelos Talos, Panos Louridas, and Dimitris Mitropoulos. PyTrim: a practical tool for reducing Python dependency bloat. In Proceedings of the 40th IEEE/ACM International Conference on Automated Software Engineering, ASE 2025: Tool Demonstrations Track. IEEE, October 2025.
    • Pavlína Wurzel Gonçalves, Pooja Rani, Margaret-Anne Storey, Diomidis Spinellis, and Alberto Bacchelli. Code review comprehension: reviewing strategies seen through code comprehension theories. In 2025 IEEE/ACM 33rd International Conference on Program Comprehension (ICPC), 589–601. IEEE, April 2025. ACM SIGSOFT Distinguished Paper Award.
    • Konstantinos Eleftheriou, Panos Louridas, and John Pavlopoulos. KostasThesis2025 at SemEval-2025 task 10 subtask 2: a continual learning approach to propaganda analysis in online news. In Sara Rosenthal, Aiala Rosá, Debanjan Ghosh, and Marcos Zampieri, editors, Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025), 899–908. Vienna, Austria, July 2025. Association for Computational Linguistics.

New Members

    • Maria-Eliza Patska
    • Tasos Gerasis
    • Andreas Lampropoulos
    • Giorgos Karandreas
    • Lazaros Giannoulakos
    • Ioanna Vougiatzi
    • Dimitrios Papakyriakopoulos
    • Konstantinos Karakatsanis
    • Dimitrios Chatzipavlis
    • Gregory Alexandrou
    • Marek Horvath

Ongoing PhDs

    • Marek Horvath Topic: Authorship attribution in Software Engineering
    • Konstantina Dritsa Topic: Data Science

Completed PhDs

    • Zoe Kotti Topic: Data Analysis Applications in Software Engineering

Seminars

      Prompt Stability Scoring for Text Annotation with Large Language Models

      Date: 13 January 2025
      Presenter: Christopher Barrie, NYU
      Abstract

      Researchers are increasingly using language models (LMs) for text annotation. These approaches rely only on a prompt telling the model to return a given output according to a set of instructions. The reproducibility of LM outputs may nonetheless be vulnerable to small changes in the prompt design. This calls into question the replicability of classification routines. To tackle this problem, researchers have typically tested a variety of semantically similar prompts to determine what we call "prompt stability." These approaches remain ad-hoc and task specific. In this article, we propose a general framework for diagnosing prompt stability by adapting traditional approaches to intra- and inter-coder reliability scoring. We call the resulting metric the Prompt Stability Score (PSS) and provide a Python package PromptStability for its estimation. Using six different datasets and twelve outcomes, we classify >150k rows of data to: a) diagnose when prompt stability is low; and b) demonstrate the functionality of the package. We conclude by providing best practice recommendations for applied researchers.

      Bio: Christopher Barrie is Assistant Professor of Sociology at NYU. He is also Core Faculty at CSMaP and Research Fellow at the Department of Sociology, University of Oxford.


      Test-based Patch Clustering for Automatically-Generated Patches Assessment

      Date: 07 February 2025
      Presenter: Maria Kechagia, UoA
      Abstract

      Previous studies have shown that Automated Program Repair (APR) techniques suffer from the overfitting problem. Overfitting happens when a patch is run and the test suite does not reveal any error, but the patch actually does not fix the underlying bug or it introduces a new defect that is not covered by the test suite. Therefore, the patches generated by APR tools need to be validated by human programmers, which can be very costly, and prevents APR tool adoption in practice. Our work aims to minimize the number of plausible patches that programmers have to review, thereby reducing the time required to find a correct patch. We introduce a novel light-weight test-based patch clustering approach called xTestCluster, which clusters patches based on their dynamic behavior. xTestCluster is applied after the patch generation phase in order to analyze the generated patches from one or more repair tools and to provide more information about those patches or facilitating patch assessment. The novelty of xTestCluster lies in using information from execution of newly generated test cases to cluster patches generated by multiple APR approaches. A cluster is formed of patches that fail on the same generated test cases. The output from xTestCluster gives developers a) a way of reducing the number of patches to analyze, as they can focus on analyzing a sample of patches from each cluster, b) additional information (new test cases and their results) attached to each patch. After analyzing 902 plausible patches from 21 Java APR tools, our results show that xTestCluster is able to reduce the number of patches to review and analyze with a median of 50%. xTestCluster can save a significant amount of time for developers that have to review the multitude of patches generated by APR tools, and provides them with new test cases that expose the differences in behavior between generated patches. Moreover, xTestCluster can complement other patch assessment techniques that help detect patch misclassification

      URL: https://link.springer.com/article/10.1007/s10664-024-10503-2

      Preprint: https://arxiv.org/pdf/2207.11082

      Biography

      Dr Maria Kechagia is an Assistant Professor in Software Engineering at the National and Kapodistrian University of Athens within the Department of Business Administration. From May 2019 to November 2014, she was a research fellow at University College London, in the UK. Previously, she was a postdoctoral researcher at the Delft University of Technology, in the Netherlands. She obtained a PhD degree from the Athens University of Economics and Business and an MSc degree from Imperial College London. Her research interests include software verification (static and dynamic analysis), automated program repair, software analytics, and software optimisation (energy efficiency and runtime performance). She has been a programme committee member of the research track of top software engineering venues including ICSE, FSE, ASE, ISSTA, MSR, ICSME, ESEM, and SANER, and a reviewer for top software engineering journals including TSE, TOSEM, EMSE, and JSS. She is a member of the editorial board of TSE.


      Evidence Standards Improve Reliability in Scholarly Peer Review

      Date: 17 February 2025
      Presenter: Dr. Paul Ralph, Dalhousie University
      Abstract

      Scholarly peer review is “the lynchpin about which the whole business of science is pivoted” (Ziman 1968). Most researchers believe peer review is effective (Ware 2008), but empirical research consistently shows that reviewers cannot reliable distinguish methodologically sound from fundamentally flawed studies (Cole 1981; Peters & Ceci 1982; Lock 1991; Rothwell and Martyn 2000; Price 2014; Ralph 2016). Consequently, we created comprehensive evidence standards and tools to improve peer review in software engineering and related fields. Objective. The objective of this study is to investigate the impact of evidence standards on scholarly peer review. Method. A randomized controlled experiment was conducted at an A-ranked software engineering conference. The program committee was randomly divided into two groups: one using a typical conference review process; the other using a standardized process based on the ACM SIGSOFT Empirical Standards for Software Engineering Research (https://acmsigsoft.github.io/EmpiricalStandards/) Results. Evidence standards significantly improve inter-reviewer reliability without harming authors’ or reviewers’ attitudes toward the review process. Reviewers using evidence standards gave more praise and focused more on research methods than style. Discussion. Asking reviewers to write free-text comments about a paper and score it on a 6-point scale from strong reject to strong accept produces data statistically indistinguishable from random noise. This means that decisions are determined entirely by reviewer selection, not the merits of the research. Conventional review processes are therefore scientifically and morally indefensible. While not a silver bullet, evidence standards significantly improve reliability, and the data collected in this study facilitates further refinement of the standards and tooling toward still greater reliability.

      Biography

      Dr. D. Paul Ralph, PhD (British Columbia), B.Sc. / B.Comm (Memorial), is an award-winning scientist, author, consultant, and Professor of Software Engineering at Dalhousie University. His cutting-edge research at the intersection of software engineering, human-computer interaction, and project management explores the relationship between software teams’ social dynamics and success. It has been used by many leading technology companies including Adobe, Amazon, AT&T, Canon, Bea Systems, Broadcom, IBM, Google, HP, Microsoft, Netflix, PayPal, Samsung, Salesforce, Yahoo!, and Walmart. He has published more than 80 peer-reviewed articles in premier venues including IEEE Transactions on Software Engineering and the ACM/IEEE International Conference on Software Engineering. Dr. Ralph is editor-in-chief of the SIGSOFT Empirical Standards for Software Engineering Research.#### Biography


      Reconstructing Android Application I/O Behaviors from Kernel Traces

      Date: 30 April 2025
      Presenter: Nikos Alexopoulos
      Abstract

      Android users face increasingly sophisticated threats, ranging from malware and state-sponsored surveillance, to supply chain attacks and a large attack surface, mostly consisting of proprietary components. Android’s semantic gap, i.e. the disconnect between application behaviors and kernel-level events (system calls), is a major limiting factor towards developing approaches capable of detecting threats in the wild. This talk will present recent research on overcoming this limitation, introducing SysDroid, a simple and lightweight approach to reconstruct Android behaviors from Linux kernel traces.

      SysDroid builds on two key insights: (a) I/O events can be captured in the kernel and attributed to applications by following IPC edges, and (b) a mapping between I/O events and interesting high-level behaviors can be established a priori by associating I/O events to high-level Android Framework API calls. The approach is effective in capturing application behaviors and can be used as the basis for further analysis.


      From Heuristics to Autonomous Agents: Preliminary Results in LLM-Powered OS Tuning

      Date: 09 July 2025
      Presenter: Georgios Liargkovas, Columbia University
      Abstract

      For decades, OS tuning has relied on static heuristics that cannot adapt to dynamic, complex workloads. While machine learning offered a path forward, traditional models like Bayesian Optimization and Reinforcement Learning introduced their own challenges: a "semantic gap" preventing true contextual understanding, brittle reward engineering, and inefficient exploration unfit for live systems. This talk argues that Large Language Models (LLMs) represent the next leap forward. We present preliminary, promising results from an LLM-powered autonomous agent that leverages reasoning and pre-trained knowledge to overcome these limitations.

      We will conclude by discussing future research directions for these emerging autonomous systems.

      Biography

      Georgios Liargkovas is a PhD student at Columbia University advised by Kostis Kaffes. His research focuses on OS scheduling and AI/ML for OS Optimization. He holds a BS in Management Science and Technology from Athens University of Economics and Business, where he conducted empirical software engineering research at BALab advised by Diomidis Spinellis.


      New Insights and Perspectives on Software Reliability, Analysis, and Security

      Date: 24 September 2025
      Presenter: Thodoris Sotiropoulos, ETH
      Abstract

      I will present the research directions our team pursued during the academic year 2024--2025 in the areas of software reliability, analysis, and security. For software reliability, we developed new methods to validate (i.e., find bugs) critical software infrastructure, focusing on (1) static analyzers which are widely used throughout the software development pipeline and (2) Infrastructure as Code (IaC) programs, which are routinely used to automate the provisioning and management of entire of computing infrastructures and servers.

      For software analysis and security, we investigated the security challenges of applications that combine high-level languages (e.g., Python, JavaScript) with low-level components (e.g., C, Rust). We introduced techniques to automatically identify and reason about the bridges between these languages. This enables powerful cross-language analyses such as vulnerability detection and reachability analysis in hybrid programs. Finally, we investigated an emerging domain: the effect of compiler optimizations on Zero-Knowledge Virtual Machines (zkVMs). zkVMs are becoming foundational in privacy-preserving and verifiable computation. Therefore, understanding the limitations of existing compiler infrastructures on zkVM performance opens new research directions, including the development of zkVM-specific passes, backends, and superoptimizers.


      Identifying Code Authorship through Static Analysis and Behavioral Biometrics

      Date: 06 October 2025
      Presenter: Marek Horváth, Technical University of Košice, Slovakia
      Abstract

      Abstract: The seminar will introduce a research direction focused on authorship attribution in software engineering, exploring how the combination of source code stylometry and behavioral biometrics can be used to distinguish individual programmers. The talk will summarize the current state of a doctoral project in this domain, including applied methods and early findings. It will also briefly present related research activities conducted at the Technical University of Košice, with a special emphasis on applications in programming education and academic integrity.

      Biography

      Marek Horváth is a PhD student at the Technical University of Košice, Slovakia. His research focuses on authorship identification in software engineering using static code analysis and behavioral biometrics. He also works on educational applications of these methods to support students and instructors in programming courses.


      PyTrim: A Practical Tool for Reducing Python Dependency Bloat

      Date: 20 October 2025
      Presenter: Konstantinos Karakatsanis
      Abstract

      Dependency bloat is a persistent challenge in Python projects, which increases maintenance costs and security risks. While numerous tools exist for detecting unused dependencies in Python, removing these dependencies across the source code and configuration files of a project requires manual effort and expertise. To tackle this challenge we introduce PYTRIM, an end-to-end system to automate this process. PYTRIM eliminates unused imports and package declarations across a variety of file types, including Python source and configuration files such as requirements.txt and setup.py. PYTRIM’s modular design makes it agnostic to the source of dependency bloat information, enabling integration with any detection tool. Beyond its contribution when it comes to automation, PYTRIM also incorporates a novel dynamic analysis component that improves dependency detection recall. Our evaluation of PYTRIM’s end-to-end effectiveness on a ground-truth dataset of 37 merged pull requests from prior work, shows that PYTRIM achieves 98.3% accuracy in replicating human-made changes. To show its practical impact, we run PYTRIM on 971 open-source packages, identifying and trimming bloated dependencies in 39 of them. For each case, we submit a corresponding pull request, 6 of which have already been accepted and merged. PYTRIM is available as an open-source project, encouraging community contributions and further development.


      PyXray: Practical Cross-Language Call Graph Construction through Object Layout Analysis

      Date: 03 November 2025
      Presenter: Georgios Alexopoulos, UoA
      Abstract

      A great number of software packages combine code in high-level languages, such as Python, with binary extensions compiled from low-level languages such as C, C++ or Rust to either boost efficiency or enable specific functionalities. In this context, high-level function calls can trigger native (binary) code execution. This setup introduces challenges for call graph generation. Accurate call graphs are essential for various applications, including vulnerability management and software maintenance, as they help track execution paths, assess security risks, and identify unused or redundant code.

      This work tackles the problem of cross-language call graph construction in Python. Instead of relying on static analysis, which struggles with identifying Python-native interactions, we propose a dynamic analysis technique which does not require inputs to execute code. Our approach is based on two key insights: (1) when a binary extension is imported from Python code, all its objects (e.g., functions) are loaded into memory, and (2) the layout of callable Python objects contains pointers to the native functions they invoke.

      By analyzing these memory layouts for every loaded object, we identify corresponding graph edges, which link Python functions to the native functions they eventually invoke. This is an essential element for constructing call graphs across language boundaries. We implement this approach in PyXray, a tool that efficiently analyzes massive Python packages such as NumPy and PyTorch in minutes, while significantly outpeforming existing static analysis methods in terms of precision and recall.

      PyXray enables two key applications: (1) cross-language vulnerability management, by identifying whether a Python package potentially calls a vulnerable native function and (2) cross-language bloat analysis, by quantifying unnecessary code across Python and native components.


      TGIF: The Evolution of Developer Commit Times

      Date: 24 November 2025
      Presenter: Ioannis Karyotakis and Evangelos Talos
      Abstract

      Understanding the evolving patterns of developer coding activity within the programming industry can help promote both individual well-being and organizational productivity. We examine the evolution of commit activity among developers over the past decade through an analysis of commit data from 4\,549 GitHub repositories. Our findings show a subtle but consistent increase in the proportion of nighttime and weekend commits, particularly during early morning hours, indicating a shift toward more flexible and asynchronous work habits. In contrast, commit patterns across weekdays have remained stable, with no statistically significant differences between individual workdays from 2015 to 2024. These trends suggest a gradual departure from the conventional 9-to-5, Monday-to-Friday structure, with developers increasingly distributing their work across broader time frames. Our findings have practical implications for developers, who can use them to advocate for flexible work policies; for managers, who can better align schedules with real-world behaviors; and for researchers, who can further explore how temporal work patterns influence productivity and well-being.


      Modeling Complex Systems Using Object-Oriented Design Techniques

      Date: 25 November 2025
      Presenter: Thodoris Sotiropoulos
      Abstract

      Programming language implementations can be viewed as complex software systems. In this talk, we explore how core object-oriented (OO) principles, especially polymorphism, help us build modular, extensible, and maintainable representations of programs. Using a small example language, we examine traditional approaches such as the "Visitor" design pattern and contrast them with modern Java features: records, sealed types, and pattern matching. We show how these mechanisms allow us to express diverse operations on program representations (evaluation, semantic analysis, etc.) elegantly and declaratively, while avoiding boilerplate and ensuring runtime safety through exhaustiveness checks.


      Data Analysis Applications in Software Engineering"

      Date: 05 December 2025
      Presenter: Zoe Kotti
      Abstract

      This presentation examines the impact and evolution of software engineering research through four interconnected studies. First, an investigation of data papers in the Mining Software Repositories conference confirms their significant value as research artifacts while identifying opportunities for improved documentation and broader topic coverage. Second, the practical impact of software engineering research is assessed through a patent analysis and author survey, demonstrating that researchers successfully equip practitioners with tools and methods, though adoption is often hindered by funding and cost-benefit challenges. Third, a comprehensive tertiary study analyzes machine learning applications in software engineering, revealing widespread adoption across tasks but significant gaps in empirical validation and industrial transfer. Finally, the work explores Large Language Models in code completion, analyzing code perplexity to understand model confidence across different programming languages. Together, these contributions offer a holistic view of how academic research translates into practice and how emerging technologies are shaping the future of software engineering.


Note: Data before 2017 may refer to grandparented work conducted by BALab's members at its progenitor laboratory, ISTLab.