Yearly Report 2021


Faculty Members

  • Damianos Chatziantoniou
  • Dimitris Mitropoulos
  • Panos (Panagiotis) Louridas
  • Diomidis Spinellis

Senior Researchers

  • Vaggelis Atlidakis
  • Makrina Viola Kosti
  • Vasiliki Efstathiou
  • Stefanos Georgiou
  • Thodoris Sotiropoulos
  • Marios Fragkoulis

Associate Researchers

  • Charalambos-Ioannis Mitropoulos
  • Zoe Kotti
  • Konstantinos Kravvaritis
  • Stefanos Chaliasos
  • Antonios Gkortzis
  • Tushar Sharma
  • Konstantina Dritsa


  • Georgios Liargkovas
  • Rafaila Galanopoulou
  • Nikos Georgakopoulos
  • Georgios - Petros Drosos
  • George Theodorou
  • Christos Pappas
  • Angeliki Papadopoulou
  • George Metaxopoulos
  • Theodosis Tsaklanos
  • Michael Loukeris
  • Marios Papachristou
  • Christos Chatzilenas
  • Ioannis Batas
  • Efstathia Chioteli
  • Vitalis Salis

Overview in numbers

New Publications Number
Monographs and Edited Volumes 2
Journal Articles 4
Book Chapters 1
Conference Publications 7
Technical Reports 0
White Papers 0
Magazine Articles 0
Working Papers 0
Datasets 1
Total New Publications 15
New Projects 0
Ongoing Projects 2
Completed Projects 0
Faculty Members 4
Senior Researchers 6
Associate Researchers 7
Researchers 15
Total Members 32
New Members 2
Ongoing PhDs 5
Completed PhDs 1
New Seminars
New Seminars 13

New Publications

Monographs and Edited Volumes

    • Stefanos Georgiou. Energy and Run-Time Performance Practices in Software Engineering. PhD thesis, Athens University of Economics and Business, Athens, Greece, February 2021.
    • Diomidis Spinellis, Georgios Gousios, Marsha Chechik, and Massimiliano Di Penta, editors. ESEC/FSE 2021: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, New York, NY, USA, 2021. Association for Computing Machinery.

Journal Articles

    • Diomidis Spinellis, Panos Louridas, and Maria Kechagia. Software evolution: the lifetime of fine-grained elements. PeerJ Computer Science, 7:e372, February 2021.
    • Diomidis Spinellis. Why computing students should contribute to open source software projects. Communications of the ACM, 64(7):36–38, July 2021.
    • Tushar Sharma, Vasiliki Efstathiou, Panos Louridas, and Diomidis Spinellis. Code smell detection by deep direct-learning and transfer-learning. Journal of Systems and Software, 176:110936, 2021.
    • Panos Louridas and Diomidis Spinellis. Conspicuous corruption: evidence at a country level. PLOS ONE, 16(9):e0255970, September 2021.

Book Chapters

    • Dimitris Mitropoulos, Theodosios Tsaklanos, and Diomidis Spinellis. Secure software technologies. In Sokratis Katsikas, Stefanos Gritzalis, and Konstantinos Lambrinoudakis, editors, Information and System Security in the Cyberspace. NewTech Pub, 2021.

Conference Publications

    • Thodoris Sotiropoulos, Stefanos Chaliasos, Vaggelis Atlidakis, Dimitris Mitropoulos, and Diomidis Spinellis. Data-oriented differential testing of object-relational mapping systems. In 43rd International Conference on Software Engineering, ICSE '21. May 2021. Distinguished Artifact Award.
    • Pedro F. Silvestre, Marios Fragkoulis, Diomidis Spinellis, and Asterios Katsifodimos. Clonos: consistent causal recovery for highly-available streaming dataflows. In Proceedings of the 2021 International Conference on Management of Data, SIGMOD/PODS '21, 1637–1650. New York, NY, USA, 2021. Association for Computing Machinery.
    • Vitalis Salis, Thodoris Sotiropoulos, Panos Louridas, Diomidis Spinellis, and Dimitris Mitropoulos. PyCG: practical call graph construction in Python. In 43rd International Conference on Software Engineering, ICSE '21. May 2021.
    • Georgios Nikitopoulos, Konstantina Dritsa, Panos Louridas, and Dimitris Mitropoulos. CrossVul: a cross-language vulnerability dataset with commit data. In 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering: Demonstrations Track, ESEC/FSE '21, 1565–1569. ACM, August 2021.
    • Rafaila Galanopoulou and Diomidis Spinellis. A dataset of open-source safety-critical software. In Elisabetta Di Nitto and Pierluigi Plebani, editors, Proceedings of the First SWForum Workshop on Trustworthy Software and Open Source, TSOS '21. March 2021. CEUR Workshop Proceedings, volume 2878.
    • Efstathia Chioteli, Ioannis Batas, and Diomidis Spinellis. Does unit-tested code crash? a case study of Eclipse. In 25th Pan-Hellenic Conference on Informatics, PCI 2021, 260–264. New York, NY, USA, 2021. Association for Computing Machinery.
    • Stefanos Chaliasos, Thodoris Sotiropoulos, Georgios-Petros Drosos, Charalambos Mitropoulos, Dimitris Mitropoulos, and Diomidis Spinellis. Well-typed programs can go wrong: a study of typing-related bugs in JVM compilers. In Proceedings of the ACM on Programming Languages, OOPSLA '21. ACM, October 2021.


    • Stefanos Chaliasos, Thodoris Sotiropoulos, Georgios-Petros Drosos, Charalambos Mitropoulos, Dimitris Mitropoulos, and Diomidis Spinellis. Well-typed programs can go wrong: a study of typing-related bugs in JVM compilers. October 2021.

New Members

    • Nikos Georgakopoulos
    • Georgios - Petros Drosos

Ongoing PhDs

    • Zoe Kotti Topic: Data Analysis Applications in Software Engineering
    • Konstantinos Kravvaritis Topic: Data and Quality Metrics of System Configuration Code
    • Antonios Gkortzis Topic: Secure Systems on Cloud Computing Infrastructures
    • Thodoris Sotiropoulos Topic: Abstractions for software testing
    • Konstantina Dritsa Topic: Data Science

Completed PhDs

    • Stefanos Georgiou Topic: Energy and Run-Time Performance Practices in Software Engineering


      R-FCN: Object Detection via Region-based Fully Convolutional Networks

      Date: 15 January 2021
      Presenter: Zoe Kotti

      We introduce the region-based convolutional neural networks (R-CNN) family of machine learning models, which are widely used in computer vision for object detection. Particularly, we focus on the R-FCN model, a region-based, fully convolutional network for accurate and efficient object detection. In contrast to previous region-based detectors such as Fast/Faster R-CNN, that apply a costly per-region subnetwork hundreds of times, R-FCN is fully convolutional with almost all computation shared on the entire image. To achieve this goal, position-sensitive score maps are proposed to address a dilemma between translation-invariance in image classification and translation-variance in object detection. This method can thus naturally adopt fully convolutional image classifier backbones, such as the latest Residual Networks (ResNets), for object detection. The authors of this work show competitive results on the PASCAL VOC datasets (e.g., 83.6% mAP on the 2007 set) with the 101-layer ResNet. Meanwhile, the result is achieved at a test-time speed of 170ms per image, 2.5-20x faster than the Faster R-CNN counterpart. Code is made publicly available at:

      Guidelines for Improving Paper and Peer Review Quality

      Date: 22 January 2021
      Presenter: Diomidis Spinellis

      Redesigning the peer review of software engineering studies can improve the process's fairness, inclusiveness, transparency, and effectiveness. group led by Paul Ralph, in which your presenter participates, is developing empirical standards under the auspices of ACM SIGSOFT. Today, when a scholar peer reviews a manuscript, they must simultaneously generate and apply a set of evaluation criteria. The problem is that generating an appropriate rubric for judging research quality is mind-numbingly difficult. So reviewers tend to generate incomplete, oversimplified and inappropriate criteria. It’s not because reviewers are stupid; it’s because no one person can generate good rubrics for all of the different kinds of studies SE researchers review. Furthermore, the reviewers, the editor, and the authors may construct totally different rubrics, and these rubrics may depart wildly from published methodological guidance, or the norms of their scientific community. Most frustration with the peer review process comes from authors and reviewers disagreeing on what makes a study good. The power imbalance between authors and reviewers makes correcting some reviewers impossible. We need to change the process so that reviewers all use the right criteria in the first place. The solution is for the software engineering community to decide together what “good” means. For each common methodology, we create a one-page checklist of specific expectations. To prevent reviewers from using the standards in an inflexible, gotcha-like manner, each criterion is paired with a simple decision tree. When a reviewer indicates that a criterion is not satisfied, they will be explicitly asked whether there is a good reason. Providing the same, specific criteria to authors and reviewers will improve research quality, simplify reviewing, reduce conflict and increase acceptance rates. The presentation will introduce the taxonomy of provided empirical standards, overview the general standard, which applies to all reviews, and focus on one particular standard as an example.

      Energy and Run-Time Performance Practices in Software Engineering

      Date: 01 February 2021
      Presenter: Stefanos Georgiou

      Energy efficiency for computer systems is an ever-growing matter that has caught the attention of the software engineering community. Although hardware design and utilization are undoubtedly key factors affecting energy consumption, there is solid evidence that software design can also significantly alter the energy consumption of IT products. Therefore, the goal of this dissertation is to show the impact of software design decisions on the energy consumption of a computer system.

      Initially, we analyzed 92 research papers from top-tier conferences and categorized them under the Software Development Life Cycle taxonomy. From this study, we were able to find many research challenges. Among these challenges, we identified that there is limited work in the context of different programming languages’ energy and delay implications.

      To this end, we performed an empirical study and pointed out which programming languages can introduce better energy and run-time performance for specific programming tasks and computer platforms (i.e., server, laptop, and embedded system). Motivated further by our survey results, we performed an additional study on different programming languages and computer platforms to demonstrate the energy and delay implications of various inter-process communication technologies (i.e, REST, RPC, gRPC).

      From the above studies, we were able to introduce guidelines on reducing the energy consumption of different applications by suggesting which programming languages to utilise in specific cases. Finally, we performed experiments to examine the energy and run-time performance taxing that security measures have over 128 distinct benchmark suites. By investigating the impact of CPU-related vulnerabilities (Meltdown, Spectre, and MDS), communication-related security measures (HTTP/HTTPS), memory protection (memory zeroing), and compiler safeguards (GCC), we have found that these measures can impact the energy and run-time performance of real-work applications (Nginx, Apache, Redis) by up to 20%.

      Understanding and Characterizing Type System-Related Bugs in Compilers of JVM Programming Languages

      Date: 05 March 2021
      Presenter: Thodoris Sotiropoulos

      Over the past decade, there was a huge interest in compiler testing that led to the disclosure of thousands of bugs in well-established and widely-used compilers. Despite this tremendous success, current research endeavors have mainly focused on detecting frustrating compiler crashes, and subtle miscompilations caused by bugs in the implementation of compiler optimizations. However, in statically-typed languages, the frontend part of a compiler is equally important, as it is the component that decides whether the input program is correct or not. In modern programming languages with sophisticated type system features the implementation of frontend is much complex, and therefore, type system-related bugs are quite often. Bugs in the implementation of frontend can break the soundness of type system, lead to rejection of correct programs, or make the compiler produce misleading reports and warnings.

      We present a study of bugs found in compiler frontends. Specifically, we examine frontend bugs reported in the top JVM programming languages, namely, Java, Scala, Kotlin, and Groovy. We evaluate each bug in terms of several criteria, including their symptom, root cause, characteristics of the test case that triggers the bug, and finally we propose a categorization. We believe that this work opens up a new direction in compiler testing, which is currently overlooked.

      Detecting Word Usage Change

      Date: 01 April 2021
      Presenter: Kaiti Thoma and Konstantina Dritsa

      The appearance of large text corpuses, covering extensive time periods, has allowed researchers to investigate qualitatively the change in the semantics of words over time. We will present the main methods, starting from an introduction to the underlying technologies that have been developed over the last decades, and presenting some highlights of results from the literature.

      PyCG: Practical Call Graph Generation in Python

      Date: 08 April 2021
      Presenter: Vitalis Salis

      Call graphs play an important role in different contexts, such as profiling and vulnerability propagation analysis. Generating call graphs in an efficient manner can be a challenging task when it comes to high-level languages that are modular and incorporate dynamic features and higher-order functions.

      Despite the language's popularity, there have been very few tools aiming to generate call graphs for Python programs. Worse, these tools suffer from several effectiveness issues that limit their practicality in realistic programs. We propose a pragmatic, static approach for call graph generation in Python. We compute all assignment relations between program identifiers of functions, variables, classes, and modules through an inter-procedural analysis. Based on these assignment relations, we produce the resulting call graph by resolving all calls to potentially invoked functions. Notably, the underlying analysis is designed to be efficient and scalable, handling several Python features, such as modules, generators, function closures, and multiple inheritance.

      We have evaluated our prototype implementation, which we call PyCG, using two benchmarks: a micro-benchmark suite containing small Python programs and a set of macro-benchmarks with several popular real-world Python packages. Our results indicate that PyCG can efficiently handle thousands of lines of code in less than a second (0.38 seconds for 1k LoC on average). Further, it outperforms the state-of-the-art for Python in both precision and recall: PyCG achieves high rates of precision ~99.2%, and adequate recall ~69.9%. Finally, we demonstrate how PyCG can aid dependency impact analysis by showcasing a potential enhancement to GitHub's "security advisory'' notification service using a real-world example.

      Data-Oriented Differential Testing of Object-Relational Mapping Systems

      Date: 22 April 2021
      Presenter: Thodoris Sotiropoulos

      We introduce, what is to the best of our knowledge, the first approach for systematically testing Object-Relational Mapping (ORM) systems. Our approach leverages differential testing to establish a test oracle for ORM-specific bugs. Specifically, we first generate random relational database schemas, set up the respective databases, and then, we query these databases using the APIs of the ORM systems under test. To tackle the challenge that ORMs lack a common input language, we generate queries written in an abstract query language. These abstract queries are translated into concrete, executable ORM queries, which are ultimately used to differentially test the correctness of target implementations. The effectiveness of our method heavily relies on the data inserted to the underlying databases. Therefore, we employ a solver-based approach for producing targeted database records with respect to the constraints of the generated queries. We implement our approach as a tool, called CYNTHIA, which found 28 bugs in five popular ORM systems. The vast majority of these bugs are confirmed (25 / 28), more than half were fixed (20 / 28), and three were marked as release blockers by the corresponding developers.

      Maritime Analytics with Real-Time Big Ship Tracking Data

      Date: 20 May 2021
      Presenter: Vasiliki Efstathiou

      Shipping has been the driving force of global trade for centuries. Today, it remains the major means of cargo transportation with almost 90% of the world’s goods estimated to be carried by sea. At the same time, shipping generates an enormous footprint of data that can unlock new possibilities for the maritime industry.

      MarineTraffic is currently the world’s leading platform offering ship tracking services and actionable maritime intelligence. Research at MarineTraffic is a paradigm of an applied research initiative, aiming to bring tangible outcomes to the market. This talk will present the lab’s efforts towards building systems for situational awareness at sea globally, demonstrating cases where the need for maritime intelligence is evident. The presentation will focus on ways of harnessing earth observation, ship tracking and behavioural data and will outline challenges and research opportunities in the journey to maritime digitalisation.

      DataMingler: A Novel Approach to Data Virtualization (ACM SIGMOD 2021)

      Date: 03 June 2021
      Presenter: Damianos Chatziantoniou

      A Data Virtual Machine (DVM) is a novel graph-based conceptual model, similar to the entity-relationship model, representing existing data (persistent, transient, derived) of an organization. A DVM can be built quickly, agilely, offering schematic flexibility to data engineers. Data scientists can visually define complex dataframe queries in an intuitive and simple manner, which are evaluated within an algebraic framework. A DVM can be easily materialized in any logical data model and can be “reoriented” around any node, offering a “single view of any entity”. In this paper we demonstrate DataMingler, a tool implementing DVMs . We argue that DVMs can have a significant practical impact in analytics environments.

      Input Algebras

      Date: 17 June 2021
      Presenter: Rahul Gopinath, Hamed Nemati, Andreas Zeller

      Grammar-based test generators are highly efficient in producing syntactically valid test inputs, and give their user precise control over which test inputs should be generated. Adapting a grammar or a test generator towards a particular testing goal can be tedious, though. We introduce the concept of a grammar transformer, specializing a grammar towards inclusion or exclusion of specific patterns: “The phone number must not start with 011 or +1”. To the best of our knowledge, ours is the first approach to allow for arbitrary Boolean combinations of patterns, giving testers unprecedented flexibility in creating targeted software tests. The resulting specialized grammars can be used with any grammar-based fuzzer for targeted test generation, but also as validators to check whether the given specialization is met or not, opening up additional usage scenarios. In our evaluation on real-world bugs, we show that specialized grammars are accurate both in producing and validating targeted inputs.

      Well-Typed Programs Can Go Wrong: A Study of Typing-Related Bugs in JVM Compilers

      Date: 24 September 2021
      Presenter: Stefanos Chaliasos

      Despite the substantial progress in compiler testing, research endeavors have mainly focused on detecting compiler crashes and subtle miscompilations caused by bugs in the implementation of compiler optimizations. Surprisingly, this growing body of work neglects other compiler components, most notably the front-end. In statically-typed programming languages with rich and expressive type systems and modern features, such as type inference or a mix of object-oriented with functional programming features, the process of static typing in compiler front-ends is complicated by a high-density of bugs. Such bugs can lead to the acceptance of incorrect programs (breaking code portability or the type system's soundness), the rejection of correct (e.g. well-typed) programs, and the reporting of misleading errors and warnings.

      We conduct, what is to the best of our knowledge, the first empirical study for understanding and characterizing typing-related compiler bugs. To do so, we manually study 320 typing-related bugs (along with their fixes and test cases) that are randomly sampled from four mainstream JVM languages, namely Java, Scala, Kotlin, and Groovy. We evaluate each bug in terms of several aspects, including their symptom, root cause, bug fix's size, and the characteristics of the bug-revealing test cases. Some representative observations indicate that: (1) more than half of the typing-related bugs manifest as unexpected compile-time errors: the buggy compiler wrongly rejects semantically correct programs, (2) the majority of typing-related bugs lie in the implementations of the underlying type systems and in other core components related to operations on types, (3) parametric polymorphism is the most pervasive feature in the corresponding test cases, (4) one third of typing-related bugs are triggered by non-compilable programs.

      We believe that our study opens up a new research direction by driving future researchers to build appropriate methods and techniques for a more holistic testing of compilers.

      Building the warehouse scale computer

      Date: 11 October 2021
      Presenter: John Wilkes, Principal Software Engineer, Google

      Imagine some product team inside Google wants 100,000 CPU cores + RAM + flash + accelerators + disk in a couple of months. We need to decide where to put them, when; whether to deploy new machines, or re-purpose/reconfigure old ones; ensure we have enough power, cooling, networking, physical racks, data centers and (over longer a time-frame) wind power; cope with variances in delivery times from supply logistics hiccups; do multi-year cost-optimal placement+decisions in the face of literally thousands of different machine configurations; keep track of parts; schedule repairs, upgrades, and installations; and generally make all this happen behind the scenes at minimum cost. And then after breakfast, we get to dynamically allocate resources (on the small-minutes timescale) to the product groups that need them most urgently, accurately reflecting the cost (opex/capex) of all the machines and infrastructure we just deployed, and monitoring and controlling the datacenter power and cooling systems to achieve minimum overheads - even as we replace all of these on the fly. This talk will highlight some of the exciting problems we're working on inside Google to ensure we can supply the needs of an organization that is experiencing (literally) exponential growth in computing capacity.

      (The presentation is kindly offered in collaboration with TUDelft Prof. Lydia Chen.)

      Going beyond the "virtually unlimited compute and storage capacity": what the (AWS) Cloud has to offer to researchers

      Date: 24 November 2021
      Presenter: Nikiforos Botis, Solutions Architect, AWS

      The term cloud computing firstly appeared in 1996 but it wasn't until 2006 when the first service was made available that was capable of giving access to remote virtual machines -via the internet- that could perform computations without needing to procure a physical server. Things have evolved significantly since then and many Cloud Service Providers (CSPs) have emerged, allowing organizations of all sizes and types, including academic institutions, to leverage the cloud for powering their IT workloads. In this talk, we will be exploring some of the offerings of AWS, the cloud services arm of Amazon, that could be of interest to researchers, especially around the areas of Serverless computing and Machine Learning.

      Nikiforos is a Solutions Architect at the AWS Greek branch, focusing on public sector customers. He has been lucky to have had the chance to contribute to the architecture of both PLF & Vaccination platforms that were launched during the pandemic to keep the country safe and the economy going. Prior to that, he spent four years in London the majority of which at the AWS UK branch which he joined as a graduate after completing his MSc Computer Science at Imperial College London. Nikiforos is a proud graduate of DMST AUEB (BSc), through which he had the chance to participate in multiple entrepreneurial and other extracurricular activities, including a semester abroad (UCL, London).

Note: Data before 2017 may refer to grandparented work conducted by BALab's members at its progenitor laboratory, ISTLab.