Our research focus is on correctness and performance of
software on modern computer architectures. Recent advances in multicore and
memory technologies provide numerous possibilities to increase the efficiency
and performance of software. But the increasing complexity and hardware abstraction
provided by today's programming languages complicate the orchestration of software
on the multicore substrate, in cloud-based data centers, and on the blockchain.
We investigate system-level techniques that improve performance and safety of
software systems. Optimizations span the entire software stack, including applications.
Current research topics include streaming system orchestration in the Cloud, parallelization of matrix
languages and Kronecker algebra operations on heterogeneous multicores and the
Cloud, high-level language memory consistency models, graph-neural networks,
and safe and efficient smart contract execution on blockchain virtual machines.
The most up-to-date overview of our work is provided in our
Cloud Streaming Frameworks
Big Data streaming applications are facing unbounded (infinite) data
sets at a scale of millions of events per second. The information captured in a
single event, e.g., GPS position information of mobile phone users, loses
value (perishes) over time and requires sub-second latency responses.
Big Data streaming engines are programmed in Java, Scala or
related programming languages targeting the Java virtual machine (JVM).
Streaming applications are composed from nodes (actors) connected by FIFO data channels. The
underlying streaming engine must achieve the throughput and latency requirements by
orchestrating a streaming application on a set of Cloud nodes.
Through this project we are investigating orchestration techniques to mitigate performance
bottlenecks of cloud-based streaming frameworks and substantially increase their performance.
Workload characterization and performance profiling are part of this investigation.
It is a challenging problem for developers to fully exploit the performance potential of heterogeneous
platforms. CPUs and GPUs have different architectural/performance characteristics,
and this difference significantly extends the design-space for partitioning the
workload across cores.
We have proposed code partitioning as a method that employs the characteristics of the
application source-code (i.e., the amount of data-parallelism and
control-dependencies) to assign code to processing cores.
Code-partitioning constitutes an
important alternative to data-partitioning, which splits the input data-set on
the assumption that processors behave identically for a given algorithm---which is rarely
the case on heterogeneous architectures.
We have employed our code partitioning technique to accelerate JPEG decoding, and for
the parallelization of Kronecker algebra operations.
Matrix Language Parallelization
Matrix languages such as Matlab and Octave found widespread use in science and engineering,
due to their easy-to-use interactive programming environments. Current implementations do
not fully utilize high-performance multicore computing platforms offered in today's
In this project we are extending the Julia
scientific computing language to automatically
parallelize matrix computations across a cluster of heterogeneous multicore nodes.
Efficient Static Analysis of Multi-threaded Software
Kronecker algebra is a matrix calculus which allows the generation of thread interleavings
from the source code of a program. Thread interleavings can be used for proving
the absence of deadlocks.
Because the number of thread interleavings grow exponentially in the number
of threads, static analyses of multithreaded programs generally suffer
from the state explosion problem.
In this project, we are investigating techniques that can cope with the large problem
sizes that arise with the static analysis of multi-threaded software. We have employed
lazy evaluation of Kronecker algebra operations to incorporate the synchronization semantics
of semaphores and Hoare-style monitors (i.e., Ada's protected objects) to constrain the state space
to feasible program paths. Our implementation for heterogeneous multicores substantially accelerates
We are currently extending our static analysis framework for Kronecker algebra operations
to be able to utilize a cluster of data center nodes. This extension is necessary
for both memory and CPU resources to support the analysis of larger problem sizes.
Conducting Kronecker algebra operations in a distributed system poses new orchestration
Non-blocking Synchronization and Memory Consistency
Shared-memory multicores require a solution to the synchronization problem when
accessing shared data. By avoiding locks, progress guarantees are attainable and
scalability can be shown to improve for many concurrent data structures. Such
non-blocking concurrent data structures are difficult to devise because synchronization
is restricted to atomic primitives provided by the underlying hardware. All of today's
CPU architectures provide memory consistency models that are weaker than sequential
consistency, which further complicates the creation of non-blocking concurrent
data structures. Persistent memory is a novel addition to the memory hierarchy, which
extends the non-blocking synchronization problem from the level of coherent caches
to the underlying main memory.
To alleviate the difficulties with low-level memory consistency models, we are working on
a high-level language primitive, the Concurrent Object, which encapsulate
the complexity of non-blocking synchronization in a language-level construct.
We are investigating the performance improvement potential of weak memory consistency
models over sequential consistency. Our initial study
on the ARM and x86 platforms revealed that the x86 platform generally benefits more from acquire-release consistency than
the surveyed ARM_v8 CPU.
We are investigating non-blocking synchronization constructs in conjunction with persistent memory.
A Validated Virtual Machine for Smart Contract Execution on the Blockchain
This project targets the recent demands of safety, security and efficiency of blockchain-based smart contracts.
Each smart contract on the blockchain must be guaranteed to be safe and secure before, during, and after its execution.
This requires a virtual machine design that has been proven correct and insusceptible to attacks.
The blockchain and its peer-to-peer decentralized consensus protocol currently consume huge compute-power, storage-space, and energy.
A drastic efficiency-improvement is required to ensure the sustainability and wide applicability of blockchain technologies.
The main goal of our project is to analyze, validate, and improve VM design for smart contract execution on the blockchain,
from architecture and ISA to runtime implementation, in terms of three properties: (1) safety, (2) security, and (3) efficiency.