Yearly Report 2018

New Publications

Journal Articles

    • Diomidis Spinellis. Under the covers of IEEE software. IEEE Software, 35(1):4–7, January 2018.
    • Diomidis Spinellis. The challenges and practices of release engineering. IEEE Software, 35(2):4–7, March 2018.
    • Diomidis Spinellis. Self-evolving software architectures. IEEE Software, 35(3):4–7, May 2018.
    • Diomidis Spinellis. Modern debugging: the art of finding a needle in a haystack. Communications of the ACM, 61(11):124–134, October 2018.
    • Tushar Sharma and Diomidis Spinellis. A survey on software smells. Journal of Systems and Software, 138:158 – 173, 2018.
    • Makrina Viola Kosti, Kostas Georgiadis, Dimitrios A. Adamos, Nikos Laskaris, Diomidis Spinellis, and Lefteris Angelis. Towards an affordable brain computer interface for the assessment of programmers' mental workload. International Journal of Human-Computer Studies, 115:52–66, 2018.
    • Maria Kechagia, Marios Fragkoulis, Panos Louridas, and Diomidis Spinellis. The exception handling riddle: an empirical study on the Android API. Journal of Systems and Software, 2018. Forthcoming.
    • Konstantina Dritsa, Dimitris Mitropoulos, and Diomidis Spinellis. Aspects of the history of computing in modern Greece. IEEE Annals of the History of Computing, 40(1):47–60, May 2018.

Conference Publications

    • Diomidis Spinellis and Georgios Gousios. How to analyze Git repositories with command line tools: we're not in Kansas anymore. In Companion: Proceedings of the 40th International Conference on Software Engineering, ICSE-C '18. New York, NY, USA, May 2018. Association for Computing Machinery. Technical Briefing.
    • Diomidis Spinellis. Unix architecture evolution from the 1970 PDP-7 to the 2018 FreeBSD: important milestones and lessons learned. Full-length presentation, February 2018. FOSDEM '18: Free and Open Source Software Developers' European Meeting. Brussels, Belgium.
    • Diomidis Spinellis. Software business, platforms, and ecosystems: fundamentals of software production research (report from dagstuhl seminar 18182). Dagstuhl Reports, 2018.
    • Diomidis Spinellis. Documented Unix facilities over 48 years. In MSR '18: Proceedings of the 15th Conference on Mining Software Repositories. New York, NY, USA, May 2018. Association for Computing Machinery.
    • Tushar Sharma, Marios Fragkoulis, Stamatia Rizou, Magiel Bruntink, and Diomidis Spinellis. Smelly Relations: Measuring and Understanding Database Schema Quality. In 40th International Conference on Software Engineering: Software Engineering in Practice Track, ICSE-SEIP '18. New York, NY, USA, May 2018. Association for Computing Machinery.
    • Tushar Sharma. Detecting and managing code smells: research and practice. In Companion: Proceedings of the 40th International Conference on Software Engineering, ICSE-C '18. New York, NY, USA, May 2018. Association for Computing Machinery. Technical Briefing.
    • Alexander Lattas and Diomidis Spinellis. Echoes from space: grouping commands with large-scale telemetry data. In 40th International Conference on Software Engineering: Software Engineering in Practice Track, ICSE-SEIP '18. New York, NY, USA, May 2018. Association for Computing Machinery.
    • Antonios Gkortzis, Dimitris Mitropoulos, and Diomidis Spinellis. VulinOSS: a dataset of security vulnerabilities in open-source systems. In 15th International Conference on Mining Software Repositories: Data Showcase Track, MSR '18. New York, NY, USA, May 2018. Association for Computing Machinery.
    • Stefanos Georgiou, Maria Kechagia Panos Louridas, and Diomidis Spinellis. What are your programming language’s energy-delay implications? In 15th International Conference on Mining Software Repositories: Technical Track, MSR '18. New York, NY, USA, May 2018. Association for Computing Machinery.
    • Vasiliki Efstathiou and Diomidis Spinellis. Code review comments: language matters. In 40th International Conference on Software Engineering: New Ideas and Emerging Results Track, ICSE-NIER '18. New York, NY, USA, May 2018. Association for Computing Machinery.
    • Vasiliki Efstathiou, Christos Chatzilenas, and Diomidis Spinellis. Word embeddings for the software engineering domain. In 15th International Conference on Mining Software Repositories: Data Showcase Track, MSR '18. New York, NY, USA, May 2018. Association for Computing Machinery.
    • Moritz Beller, Niels Spruit, Diomidis Spinellis, and Andy Zaidman. On the dichotomy of debugging behavior among programmers. In Proceedings of the 40th International Conference on Software Engineering. May 2018. To appear.

White Papers

    • Diomidis Spinellis, Nikos Vasilakis, Nancy Pouloudi, and Niki Tsouma. Electronic government in Greece: successes, problems, and the road to digital transformation. Available online \url https://www.dianeosis.org/research/egov_study/, March 2018. A study prepared for the diaNEOsis think tank. In Greek.

Projects

New Projects

    • Action II(3) - Software Engineering UNified MEmory Computing (SEUNMEC)

Ongoing Projects

    • CROSSMINER - Developer-Centric Knowledge Mining from Large Open-Source Software Repositories

Completed Projects

    • Action II(3) - Software Engineering UNified MEmory Computing (SEUNMEC)
    • SENECA - Software ENgineering in Enterprise Cloud Applications

New Members

    • Zoe Kotti
    • Marios Papachristou
    • Konstantinos Kravvaritis (PhD student)
    • Christos Chatzilenas

Ongoing PhDs

    • Konstantinos Kravvaritis Topic: Data and Quality Metrics of System Configuration Code
    • Antonios Gkortzis Topic: Secure Systems on Cloud Computing Infrastructures
    • Stefanos Georgiou Topic: Energy Efficiency in Cloud Computing
    • Tushar Sharma Topic: Software Engineering in Enterprise Cloud Applications
    • Thodoris Sotiropoulos Topic: Techniques for Improving the Reliability of Event-Driven Programs

Seminars

      Keeping track of licenses

      Date: 10 January 2018
      Presenter: Alexios Zavras
      Abstract

      Software nowadays is an amalgamation of numerous components; typical numbers show that every software product is mostly comprised of re-usable code like libraries, with only up to 20% of the code being specific to the product. Keeping track of all these components and their metadata, such as origin and licenses, is a significant problem that has to be solved by each and every software producer. The talk will discuss the need to keep accurate information on the components and present the attempts to solve the issue that are currently being designed and tried out in the industry. Questions and discussion are welcome and greatly appreciated!

      Alexios Zavras (zvr) is the Senior Open Source Compliance Engineer of Intel Corp. He has been involved with Free and Open Source Software since 1983, and is an evangelist for all things Open. He has a PhD in Computer Science after having studied Electrical Engineering and Computer Science in Greece and the United States.


      DesigniteJava, a tool that potencially extracts a vast amount of code Smells

      Date: 31 January 2018
      Presenter: Theodore Stassinopoulos
      Abstract

      There are many tools that perform static analysis on sourcecode and extract information about the existence of code smells. I would like to present our project which is in progress and focuses on Java source code. The main topics will include: a summary of metrics and code smells that our project is able to extract, which are the main logical functionalities performed during execution and what goals this project tries to achieve.


      Big Data Mediation: Α Conceptual Data Model and an Algebraic Framework for Efficient Dataframe Processing

      Date: 21 February 2018
      Presenter: Damianos Chatziantoniou
      Abstract

      Most analytics projects focus on the management of the 3Vs of big data and use specific stacks to support this variety. However, they constrain themselves to ''local'' data, data that exists within or ''close'' to the organization. And yet, as it has been recently pointed out, ''the value of data explodes when it can be linked with other data.'' In this paper we present our vision for a global marketplace of analytics---either in the form of per-entity metrics or per-entity data, provided by globally accessible data management tasks---where a data scientist can pick and combine data at will in her data mining algorithms, possibly combining with her own data. The main idea is to use the dataframe, a popular data structure in R and Python. Currently, the columns of a dataframe contain computations or data found within the data infrastructure of the organization. We propose to extend the concept of a column. A column is now a collection of key-value pairs, produced anywhere by a remotely accessed program (e.g., an SQL query, a MapReduce job, even a continuous query.) The key is used for the outer join with the existing dataframe, the value is the content of the column. This whole process should be orchestrated by a set of well-defined, standardized APIs. We argue that the proposed architecture presents numerous challenges and could be beneficial for big data interoperability. In addition, it can be used to build mediation systems involving local or global columns. Columns correspond to attributes of entities, where the primary key of the entity is the key of the involved columns.


      Working with TravisCI

      Date: 14 March 2018
      Presenter: Stefanos Georgiou
      Abstract

      Continuous Integration it is a cutting-edge approach to build, test, and integrate software practitioners work frequently. By utilizing such a service, it makes the error detection quicker and reduces integration problems significantly. In this tutorial, we are presenting the Travis CI, a distributed hosting, building, and testing tool for software projects on Github. In addition, we show how a developer can attach a Github Pages project with Travis to automate the Github's push procedure.


      Documented Unix Facilities Over 48 Years (MSR rehearsal)

      Date: 29 March 2018
      Presenter: Diomidis Spinellis
      Abstract

      The documented Unix facilities data set provides the details regarding the evolution of 15596 unique facilities through 93 versions of Unix over a period of 48 years. It is based on the manual transcription of early scanned documents, on the curation of text obtained through optical character recognition, and on the automatic extraction of data from code available on the Unix History Repository. The data are categorized into user commands, system calls, C library functions, devices and special files, file formats and conventions, games et. al., miscellanea, system maintenance procedures and commands, and system kernel interfaces. A timeline view allows the visualization of the evolution across releases. The data can be used for empirical research regarding API evolution, system design, as well as technology adoption and trends.


      What Are Your Programming Language’s Energy-Delay Implications?

      Date: 18 April 2018
      Presenter: Stefanos Georgiou
      Abstract

      Motivation: Even though many studies examine the energy efficiency of hardware and embedded systems, those that investigate the energy consumption of software applications are still limited, and mostly focused on mobile applications. As modern applications become even more complex and heterogeneous a need arises for methods that can accurately assess their energy consumption. Goal: Measure the energy consumption and run-time performance of commonly used programming tasks implemented in different programming languages and executed on a variety of platforms to help developers to choose appropriate implementation platforms. Method: Obtain measurements to calculate the Energy Delay Product, a weighted function that takes into account a task’s energy consumption and run-time performance. We perform our tests by calculating the Energy Delay Product of 25 programming tasks, found in the Rosetta Code Repository, which are implemented in 14 programming languages and run on three different computer platforms, a server, a laptop, and an embedded system. Results: Compiled programming languages are outperforming the interpreted ones for most, but not for all tasks. C, C#, and JavaScript are on average the best performing compiled, semi-compiled, and interpreted programming languages for the Energy Delay Product, and Rust appears to be well-placed for i/o-intensive operations, such as file handling. We also find that a good behaviour, energy-wise, can be the result of clever optimizations and design choices in seemingly unexpected programming languages.


      Smelly Relations: Measuring and Understanding Database Schema Quality

      Date: 24 April 2018
      Presenter: Tushar Sharma
      Abstract

      Context: Databases are an integral element of enterprise applications. Similarly to code, database schemas are also prone to smells - best practice violations. Objective: We aim to explore database schema quality, associated characteristics and their relationships with other software artifacts. Method: We present a catalog of 13 database schema smells and elicit developers' perspective through a survey. We extract embedded SQL statements and identify database schema smells by employing the DbDeo tool which we developed. We analyze 2925 production-quality systems (357 industrial and 2568 well-engineered open-source projects) and empirically study quality characteristics of their database schemas. In total, we analyze 629 million lines of code containing more than 393 thousand SQL statements. Results: We find that the index abuse smell occurs most frequently in database code, that the use of an ORM framework doesn't immune the application from database smells, and that some database smells, such as adjacency list, are more prone to occur in industrial projects compared to open-source projects. Our co-occurrence analysis shows that whenever the clone table} smell in industrial projects and the values in attribute definition smell in open-source projects get spotted, it is very likely to find other database smells in the project. Conclusion: The awareness and knowledge of database smells are crucial for developing high-quality software systems and can be enhanced by the adoption of better tools helping developers to identify database smells early.


      Word Embeddings for the Software Engineering Domain

      Date: 24 April 2018
      Presenter: Vasiliki Efstathiou
      Abstract

      The software development process produces vast amounts of textual data expressed in natural language. Outcomes from the natural language processing community have been adapted in software engineering research for leveraging this rich textual information; these include methods and readily available tools, often furnished with pre–trained models. State of the art pre–trained models however, capture general, common sense knowledge, with limited value when it comes to handling data specific to a specialized domain. There is currently a lack of domain-specific pre–trained models that would further enhance the processing of natural language artefacts related to software engineering. To this end, we release a word2vec model trained over 15GB of textual data from Stack Overflow posts. We illustrate how the model disambiguates polysemous words by interpreting them within their software engineering context. In addition, we present examples of fine-grained semantics captured by the model, that imply transferability of these results to diverse, targeted information retrieval tasks in software engineering and motivate for further reuse of the model.


      Parallel Computing in the era of the Cloud and Heterogeneous Computing

      Date: 07 May 2018
      Presenter: Rizos Sakellariou
      Abstract

      Traditionally, the objective of parallel computing has been to minimize execution time. As the complexity and the costs associated with modern execution platforms and infrastructures grow, parallel execution time cannot be viewed as a single objective to achieve at any cost. Instead, with such execution platforms consuming large amounts of energy, one needs to assess improvements in execution time against other types of cost. Cloud computing platforms, which are often used to execute parallel applications, typically follow a resource-on-demand paradigm, where users can pay for what resources they need. However, the underlying infrastructures suffer from increasing complexity which is partly masked by having users pay, sometimes for more than they need.

      In this respect, the talk will motivate the need to address efficiently the issues related to the concurrent use of multiple (and often heterogeneous) resources offered by Cloud providers by capturing these issues as some form of a multi-objective optimization problem, which requires a good understanding and appreciation of different trade-offs. The talk will make this argument by presenting experience and research on planning the (parallel) execution of scientific workflow applications on the Cloud in a way that tries to strike a balance between different trade-offs such as execution time, energy consumption and cost. Algorithms and techniques, experimental results and ongoing research will be presented.

      Rizos Sakellariou obtained his PhD from the University of Manchester in 1997 for a thesis on compile-time parallel loop partitioning and scheduling. Since then, he has held posts with Rice University and the University of Cyprus and for the last 18 years with the University of Manchester where he is leading a laboratory carrying our research in High-Performance, Parallel and Distributed systems, which over the last ten years has hosted more than 30 doctoral students, researchers and long-term visitors. He has carried out research on a number of topics related to parallel and distributed computing (including Grid and Cloud computing), with an emphasis on problems stemming from efficient/effective resource utilization and workload allocation issues. Further information about his research can be found on Google Scholar.


      Echoes from Space - Grouping Commands with Large-Scale Telemetry Data

      Date: 09 May 2018
      Presenter: Alexandros Lattas
      Abstract

      Background: As evolving desktop applications continuously accrue new features and grow more complex with denser user interfaces and deeply-nested commands, it becomes inefficient to use simple heuristic processes for grouping GUI commands in multi-level menus. Existing search-based software engineering studies on user performance prediction and command grouping optimization lack evidence-based answers on choosing a systematic grouping method.Research Questions: We investigate the scope of command grouping optimization methods to reduce a user’s average task completion time and improve their relative performance, as well as the benefit of using detailed interaction logs compared to sampling. Method: We introduce seven grouping methods and compare their performance based on extensive telemetry data, collected from program runs of a CAD application. Results: We find that methods using global frequencies, user-specific frequencies, deterministic and stochastic optimization, and clustering perform the best. Conclusions: We reduce the average user task completion time by more than 17%, by running a Knapsack Problem algorithm on clustered users, training only on a small sample of the available data. We show that with most methods using just a 1% sample of the data is enough to obtain nearly the same results as those obtained from all the data. Additionally, we map the methods to specific problems and applications where they would perform better. Overall, we provide a guide on how practitioners can use search-based software engineering techniques when grouping commands in menus and interfaces, to maximize users’ task execution efficiency.


      Code Review Comments: Language Matters

      Date: 23 May 2018
      Presenter: Vasiliki Efstathiou
      Abstract

      Recent research provides evidence that effective communication in collaborative software development has significant impact on the software development lifecycle.
      Although related qualitative and quantitative studies point out textual characteristics of well-formed messages, the underlying semantics of the intertwined linguistic structures still remain largely misinterpreted or ignored. Especially, regarding quality of code reviews the importance of thorough feedback, and explicit rationale is often mentioned but rarely linked with related linguistic features. As a first step towards addressing this shortcoming, we propose grounding these studies on theories of linguistics. We particularly focus on linguistic structures of coherent speech and explain how they can be exploited in practice. We reflect on related approaches and examine through a preliminary study on four open source projects, possible links between existing findings and the directions we suggest for detecting textual features of useful code reviews.


      VulinOSS: A Dataset of Security Vulnerabilities in Open-source Systems

      Date: 23 May 2018
      Presenter: Antonis Gkortzis
      Abstract

      Examining the different characteristics of open-source software in relation to security vulnerabilities, can provide the research community with findings that can lead to the development of more secure systems. We present a dataset where the reported vulnerabilities of 8694 open-source project versions, can be correlated with the corresponding source code and a number of software metrics. The metrics were obtained by analyzing the project's source code via well-established tools. Apart from commonly used metrics (e.g. loc), we also provide data related to modern development trends such as continuous integration and testing. We outline motivational examples based on the dataset we describe.


      Anonymisation Through Re-encryption Shuffles

      Date: 21 June 2018
      Presenter: Panos Louridas
      Abstract

      Data anonymisation is not easy: the Internet, after all, was not created for anonymous communications. One way to anonymize digital data is through re-encryption shuffles, i.e., shuffles of re-encrypted data. Re-encryption shuffles are not new, yet efficient open source solutions are hard to come by. This presentation will give the background of challenges faced in anonymisation and report progress in the implementation of a new re-encryption shuffle that uses modern cryptographic techniques.


      Java Decompiler using Machine Translation Techniques

      Date: 10 July 2018
      Presenter: Christos Chatzilenas
      Abstract

      Abstract: A decompiler is a computer program that takes as input an executable file and produces a high-level source code file which can be recompiled successfully. Even though a decompiler may not always reconstruct perfectly the original source code, it remains an important tool for reverse engineering of computer software. The process of decompilation is very useful for the recovery of lost source code, for analyzing and understanding software whose code is not available, even for computer security in some cases. In this thesis, in order to address the decompilation problem we transform it to a translation problem which can be solved using machine translation. Two approaches are studied, statistical and neural machine translation, using two open-source tools Moses and OpenNMT, respectively. Maven repositories are retrieved from GitHub in order to form the dataset and an appropriate procedure is used to construct the parallel corpora. In this context experiments in Moses are not successful while the result of translation using neural machine translation, is fairly good. The difference between the decompiler presented in this thesis and existing Java decompilers is the fact that it can translate isolated bytecode snippets. Furthermore, this approach can be extended to produce better results by recovering comments, variables, methods and class names. Finally, this study illustrates that the Java source code which is produced from the decompilation is often accurate and can provide a useful picture of the snippets' behavior.


      Speech quality and sentiment analysis on the Hellenic Parliament proceedings

      Date: 10 July 2018
      Presenter: Konstantina Dritsa
      Abstract

      “It's not what you say, but how you say it”. How often have you heard that phrase? Have you ever wished that you could take an objective and comprehensive look into what is said and how it is said in politics? Within this project, we examined the records of the Hellenic Parliament sittings from 1989 up to 2017 in order to evaluate the speech quality and examine the palette of sentiments that characterize the communication among its members. The readability of the speeches is evaluated with the use of the “Simple Measure of Gobbledygook” (SMOG) formula, partially adjusted to the Greek language. The sentiment mining is achieved with the use of two Greek sentiment lexicons. Our findings indicate a significant drop on the average readability score of the parliament records from 2003 up to 2017. On the other hand, the sentiment analysis presents steady scores throughout the years. The communication among parliament members is characterized mainly by the feeling of surprise followed closely by anger and disgust. At the same time our results show a steady prevalence of positive words over negative. The results are presented in graphs, mainly in comparison between political parties as well as between time intervals.


      Implementation of a Graphical User Interface for Unix Commands

      Date: 19 September 2018
      Presenter: Antonis Spyropoulos
      Abstract

      The Unix operating system is one of the widest spread operating systems. It has many distributions for a plethora of devices. For many years, the only way to interact with the user was the command line. Shell commands are powerful, but their execution with options and arguments is difficult, because users cannot remember all of them.

      The goal of this work is to implement a graphical user interface which will guide the user on creating valid commands or shell scripts. The interface presents to the user the available options, arguments and their meaning. This information is extracted from each command's source code and documentation.

      The implementation can be split in two parts. The first one is the extraction of the required data for each command. The second one is the creation of a graphical user interface. The extraction tool is reliable for specific commands. However, there are some commands with special characteristics that cannot be extracted reliably. The graphical user interface works perfectly if it is fed with correct data.


      Introduction to the Research Conducted in the SerVal Research Team; Mutation Testing Advances

      Date: 23 October 2018
      Presenter: Yves Le Traon and Mike Papadakis
      Abstract

      Mutation testing realises the idea of using artificial defects to support testing activities. Mutation is typically used as a way to evaluate the adequacy of test suites, to guide the generation of test cases and to support experimentation. Mutation has reached a maturity phase and gradually gains popularity both in academia and in industry. This talk will survey the advances related to the fundamental problems of mutation testing, will set out the challenges and open problems for the future development of the method and will present ongoing industrial projects using mutation testing.

      Yves Le Traon is professor of Computer Science at University of Luxembourg, in the domain of software engineering, with a focus on software testing, and software security, and applications in the domains of mobile computing and sensor-based systems. He is head of the SerVal group (SEcurity, Reasoning and VALidation) of the Interdisciplinary Centre for Security, Reliability and Trust (SnT). His research interests include (1) innovative testing and debugging techniques, (2) mobile Android security using static code analysis, machine learning techniques and, (3) model-driven analytics with applications in the domains of IoT, smart grid, Fintech, Industry 4.0, and data-intensive systems in general.

      Mike Papadakis is a research scientist at Luxembourg University's Interdisciplinary Centre for Security, Reliability and Trust


      Static Analysis for Asynchronous JavaScript Programs

      Date: 12 November 2018
      Presenter: Thodoris Sotiropoulos
      Abstract

      Asynchrony has become an inherent element of JavaScript because it is a mean for improving the scalability and performance of modern web applications. To this end, JavaScript provides programmers with a wide range of constructs and features for developing code that performs asynchronous computations, including but not limited to timers, promises and non-blocking I/O. However, the data flow imposed by asynchrony is implicit and is not always well-understood by the developers who introduce many asynchrony-related bugs to their programs. Even worse, there are few tools and techniques available for analysing and reasoning about such asynchronous applications. In this work, we address this issue by designing and implementing one of the first static analysis schemes capable of dealing with almost all the asynchronous primitives of JavaScript up to the 7th edition of the ECMAScript specification. Specifically, we introduce callback graph, a representation for capturing data flow between asynchronous code. We exploit the callback graph for designing a more precise analysis that respects the execution order between different asynchronous functions. We parameterise our analysis with two novel context-sensitivity strategies and we end up with multiple analysis variations for building callback graph. Then we perform a number of experiments on a set of hand-written and real-world JavaScript programs. Our results show that our analysis can be applied to medium-sized programs achieving 79% precision. The findings further suggest that analysis sensitivity is able to improve accuracy up to 11,3% on average, without highly sacrificing performance.


      Practices and Tools for Better Software Testing

      Date: 05 December 2018
      Presenter: Davide Spadini
      Abstract

      Automated testing has become an essential process for improving the quality of software systems. In fact, testing can help to point out defects and to ensure that production code is robust under many usage conditions. However, writing and maintaining high-quality test code is challenging and frequently considered of secondary importance. Managers, as well as developers, do not treat test code as equally important as production code, and this behaviour could lead to poor test code quality, and in the future to defect-prone production code. The goal of my research is to bring awareness to developers on the effect of poor testing, as well as helping them in writing better test code. To this aim, I am working on 2 different perspectives: (1) studying best practices on software testing, identifying problems and challenges of current approaches, and (2) building new tools that better support the writing of test code, that tackle the issues we discovered with previous studies.


      High-availability in scale-out stream processing

      Date: 17 December 2018
      Presenter: Marios Fragkoulis
      Abstract

      Data stream processing offers low latency processing of bounded and unbounded data sets with strict semantics over the time dimension of data and consistent fault-tolerant operation. The applicability of data stream processing and the maturity of current stream processing systems (SPS) has produced numerous and large scale deployments around the world for commercial and other important use cases, such as fraud detection, risk analysis, and disaster prediction.

      When processing unbounded data for commercial or critical purposes the capability to recover from failures quickly, if not transparently, is important. In this work in progress we study three different recovery configurations: - restart recovery, which restarts a stream processing job in case of an operator failure from the latest checkpointed state - standby task recovery, which substitutes a failed operator with a standby instance - process pair, which runs two coordinated instances of a stream processing job in order to switch from one to the other in case of failure. Through empirical experiments we plan to map the tradeoffs between availability and resource utilization between the recovery configurations.


Note: Some of the above data refer to grandfathered work conducted by BALab's members at its progenitor laboratory, ISTLab.