Our research focus is on correctness and performance of software on modern computer architectures. Recent advances in multicore and memory technologies provide numerous possibilities to increase the efficiency and performance of software. But the increasing complexity and hardware abstraction provided by today’s programming languages complicate the orchestration of software on the multicore substrate, in cloud-based data centers, and on the blockchain.

We investigate system-level techniques that improve performance and safety of software systems. Optimizations span the entire software stack, including applications.

Current research topics include streaming system orchestration in the Cloud, parallelization of matrix languages and Kronecker algebra operations on heterogeneous multicores and the Cloud, high-level language memory consistency models, graph-neural networks, and safe and efficient smart contract execution on blockchain virtual machines.

The most up-to-date overview of our work is provided in our list of publications.

Cloud Streaming Frameworks

Big Data streaming applications are facing unbounded (infinite) data sets at a scale of millions of events per second. The information captured in a single event, e.g., GPS position information of mobile phone users, loses value (perishes) over time and requires sub-second latency responses.

Big Data streaming engines are programmed in Java, Scala or related programming languages targeting the Java virtual machine (JVM). Streaming applications are composed from nodes (actors) connected by FIFO data channels. The underlying streaming engine must achieve the throughput and latency requirements by orchestrating a streaming application on a set of Cloud nodes.

Through this project we are investigating orchestration techniques to mitigate performance bottlenecks of cloud-based streaming frameworks and substantially increase their performance. Workload characterization and performance profiling are part of this investigation.

Heterogeneous Computing

It is a challenging problem for developers to fully exploit the performance potential of heterogeneous platforms. CPUs and GPUs have different architectural/performance characteristics, and this difference significantly extends the design-space for partitioning the workload across cores.

We have proposed code partitioning as a method that employs the characteristics of the application source-code (i.e., the amount of data-parallelism and control-dependencies) to assign code to processing cores. Code-partitioning constitutes an important alternative to data-partitioning, which splits the input data-set on the assumption that processors behave identically for a given algorithm—which is rarely the case on heterogeneous architectures.

We have employed our code partitioning technique to accelerate JPEG decoding, and for the parallelization of Kronecker algebra operations.

Matrix Language Parallelization

Matrix languages such as Matlab and Octave found widespread use in science and engineering, due to their easy-to-use interactive programming environments. Current implementations do not fully utilize high-performance multicore computing platforms offered in today’s data centers.

In this project we are extending the Julia scientific computing language to automatically parallelize matrix computations across a cluster of heterogeneous multicore nodes.

Efficient Static Analysis of Multi-threaded Software

Kronecker algebra is a matrix calculus which allows the generation of thread interleavings from the source code of a program. Thread interleavings can be used for proving the absence of deadlocks. Because the number of thread interleavings grow exponentially in the number of threads, static analyses of multithreaded programs generally suffer from the state explosion problem.

In this project, we are investigating techniques that can cope with the large problem sizes that arise with the static analysis of multi-threaded software. We have employed lazy evaluation of Kronecker algebra operations to incorporate the synchronization semantics of semaphores and Hoare-style monitors (i.e., Ada’s protected objects) to constrain the state space to feasible program paths. Our implementation for heterogeneous multicores substantially accelerates this analysis.

We are currently extending our static analysis framework for Kronecker algebra operations to be able to utilize a cluster of data center nodes. This extension is necessary for both memory and CPU resources to support the analysis of larger problem sizes. Conducting Kronecker algebra operations in a distributed system poses new orchestration challenges.

Non-blocking Synchronization, Memory Consistency, and Persistent Memory

Shared-memory multicores require a solution to the synchronization problem when accessing shared data. By avoiding locks, progress guarantees are attainable and scalability can be shown to improve for many concurrent data structures. Such non-blocking concurrent data structures are difficult to devise because synchronization is restricted to atomic primitives provided by the underlying hardware. All of today’s CPU architectures provide memory consistency models that are weaker than sequential consistency, which further complicates the creation of non-blocking concurrent data structures. Persistent memory is a novel addition to the memory hierarchy, which extends the non-blocking synchronization problem from the level of coherent caches to the underlying main memory.

To alleviate the difficulties with low-level memory consistency models, we are working on a high-level language primitive, the Concurrent Object, which encapsulate the complexity of non-blocking synchronization in a language-level construct.

We are investigating the performance improvement potential of weak memory consistency models over sequential consistency. Our initial study on the ARM and x86 platforms revealed that the x86 platform generally benefits more from acquire-release consistency than the surveyed ARM_v8 CPU.

We are investigating non-blocking synchronization constructs in conjunction with persistent memory.

A Validated Virtual Machine for Smart Contract Execution on the Blockchain

This project targets the recent demands of safety, security and efficiency of blockchain-based smart contracts. Each smart contract on the blockchain must be guaranteed to be safe and secure before, during, and after its execution. This requires a virtual machine design that has been proven correct and insusceptible to attacks. The blockchain and its peer-to-peer decentralized consensus protocol currently consume huge compute-power, storage-space, and energy. A drastic efficiency-improvement is required to ensure the sustainability and wide applicability of blockchain technologies.

The main goal of our project is to analyze, validate, and improve VM design for smart contract execution on the blockchain, from architecture and ISA to runtime implementation, in terms of three properties: (1) safety, (2) security, and (3) efficiency.