unraveling the threads of concurrency and cryptography
This is part 1 of a series on Concurrency & Cryptography by Archita Agarwal.
Concurrent and distributed computing form the backbone of modern system architectures and infrastructures. However, with security and privacy now paramount for every organization, cryptography is becoming integral to core systems.
In this series, we will explore how to integrate cryptography into modern systems to make them more secure. We’ll delve into the challenges we’ll face when we do this, and highlight new and interesting research avenues where cryptography intersects with concurrent systems. An important objective of this series is to foster collaboration between cryptographers and concurrent system researchers to solve the big problems our systems face today.
Concurrent vs Distributed Systems
First things first. Let’s talk about what we mean by concurrent systems, and how they differ from distributed systems. Since they are related, people often mix them up, so let’s clear that up.
A distributed system refers to a collection of independent computers that act like one big system to the users. These computers communicate with each other through a network, and they may be located in different physical locations. Concurrent systems, on the other hand, are systems where a lot of things/computations are happening simultaneously. Now, these “multiple computations happening at the same time” can occur in both distributed and non-distributed settings. For instance, a distributed system can be considered a concurrent system, because different computers in the system can be handling different client requests at the same time. On the other hand, a non-distributed system, such as your laptop, is also a concurrent system, because it can handle multiple tasks at the same time, thanks to its multiple cores or multi-threading.
In our series, we will focus on non-distributed concurrent systems to keep things simple and understandable. Interestingly, even non-distributed concurrent systems are ubiquitous in modern computing environments, so what we discuss here is directly applicable to various settings. Moreover, the ideas can be extended to distributed settings as well since both are related in some sense.
Concurrent Systems are Ubiquitous
Let us start with some examples of (non-distributed) concurrent systems to make our discussion more concrete. When you start a virtual machine on a cloud platform like AWS, the operating system uses concurrent data structures to manage different tasks efficiently. Whether it’s scheduling processes or managing memory, concurrent structures help things run smoothly. The same goes for databases in the cloud. They use concurrency control mechanisms to handle lots of people accessing the database at the same time. Another everyday example of non-distributed concurrent systems is your email inbox. While you’re reading messages, other people can be sending new ones.
This is all thanks to concurrency making things work smoothly and correctly. These examples demonstrate just a few instances of how concurrency is employed across various kinds of systems to enhance their efficiency and resource utilization.
Cryptography in Concurrent Data Structures
Hopefully, it is now clear that concurrent systems are ubiquitous and that they handle sensitive information. So we would really like to integrate cryptography into these systems to enhance their security. Instead of starting with complex systems like DBMSs or operating systems which have thousands of components, we will start small with something simpler, like concurrent data structures. If we can integrate cryptography into data structures, then we can build more complex systems on top of them. It will also help us restrict the problem and focus on foundational issues that will help us understand the intricacies of merging cryptography with concurrency.
As we will see, this will also lead to new research directions that are motivated by real-world problems and that are directly relevant to commercial systems like MongoDB’s Queryable Encryption. These problems will vary from theory-oriented efforts to establish the field’s foundations, to more algorithmic endeavors like designing encrypted concurrent data structures, and further to applied research where libraries are developed for engineers to deploy these structures in real systems.
Multi-Client Protocols are Not (necessarily) Concurrent
Before we get started, we need to address a common confusion which is that multi-client protocols are inherently concurrent. This is not the case. In other words, just because a cryptographic protocol can handle multiple clients doesn’t automatically make it concurrent.
Real concurrency happens on the server side, where operations run simultaneously. For instance, one operation starts, does its thing for a while, then while it is waiting for an expensive disk read, it lets another operation execute, and so on. Now, multi-client cryptographic protocols typically only ensure that multiple clients can do their thing (even if it’s one after the other), thanks to the protocol’s design. For instance, in a multi-client Symmetric Searchable Encryption (SSE) scheme, we want documents added by one client to be searchable by another, but it doesn’t necessarily mean that searches and updates can be executed side by side.
Now, whether the concurrent operations come from the same client or from multiple clients doesn’t matter much when we’re concerned with concurrency on the server. In a single-client setup, one client can issue a bunch of operations all at once, and the server will handle them by spinning up a thread for each operation, getting everything done simultaneously. In a multi-client setup, operations from different clients can be run side by side. The choice of which model to use depends on what you’re building. For instance, if you’re instantiating a virtual machine on AWS, you’re probably in a single-client setup because you are the only one using the virtual machine. But if you’re a MongoDB customer such as a bank, you might need the multi-client setup to allow different bank branches to operate on the database at once. We might switch between single client vs multiple client settings in our discussion depending on what makes more sense.
A New Setting
Now, let’s set the stage. In our scenario, we have either a single or multiple clients interacting with a multi-threaded server. Clients upload their data to the server in a data structure and interact with it through various operations. To properly model the multitasking ability of servers in our setting, we need to change our adversarial model and correctness guarantee from what we typically see in cryptography.
Adversarial Model
Our adversary will be similar to the standard persistent honest-but-curious adversary considered in structured encryption. Specifically, the adversary corrupts the server and observes everything, including memory, disk and the messages exchanged between the client and server during the execution of operations. But in addition to this, the adversary also controls the execution schedules of operations. This means that it decides how the execution of operations are interleaved together, effectively capturing the asynchronous characteristic of concurrent systems.
In this setting, our goal will be to design protocols that guarantee “security” even in the presence of an adversary that not only passively observes what happens at the server, but also actively chooses the execution schedule of operations.
Correctness
Another notion that needs to be adjusted in order to properly capture concurrent structures is correctness because the traditional notions of correctness in cryptography are not meaningful in our setting.
Consider a multi-map that stores a label/tuple pair \((\ell,v)\), where \(v = (v_1, \dots,v_{10})\), an append operation on \((\ell,v_{11})\) and a get operation on \(\ell\). Now suppose the append and get operations are concurrent. Should we require the get operation to return \((v_1,\dots, v_{10})\) or \((v_1,\dots, v_{11})\)? Either answer is acceptable depending on how the lower-level instructions of the operations are scheduled.
Because of this, correctness in concurrent settings is replaced with the notion of consistency which guarantees that the outputs of the operations are consistent with some sequential order of the operations; even if their instructions are actually interleaved. Returning to the example, if the get outputs \((v_1,\dots, v_{11})\) then its output is consistent with the sequential order (append, get), whereas if it outputs \((v_1,\dots, v_{10})\) then its output is consistent with the sequential order (get, append). Many different notions of consistency have been defined and studied for concurrent systems. In our model of encrypted concurrent structures, it is important that we incorporate these consistency notions, especially since, as we will see in later posts, consistency guarantees can have an effect on security guarantees.
Preview of the Rest of the Series
In the upcoming posts, we will focus on designing concurrent data structures that are not only efficient and secure but also consistent. Secure in the sense that an adversary that controls the machine, including its scheduler, does not learn anything about user data or queries. And consistent in the sense that it satisfies a well-defined notion of consistency.
We’ll discuss a specific encrypted multi-map construction and prove that it satisfies a very strong notion of consistency called linearizability and security. Through this example, we’ll highlight the challenges of integrating techniques from both fields, since common techniques from concurrent data structures may inadvertently result in leakage about user data or queries. Additionally, we’ll discuss real-world constraints that need to be considered when developing deployable solutions, such as the problems that client state can pose in cloud systems.
The process of designing concurrent encrypted structures is challenging, but assessing their security is even more intricate and nuanced; specifically because consistency and security interact. To handle this, we will need to translate high level consistency and security notions into precise technical definitions so that we can conduct formal security analyses like those conducted by cryptographers and consistency proofs like those conducted in concurrent systems.
Interestingly, we will also see why it is not sufficient to consider these concurrency and security separately; instead, they must be examined in tandem. By addressing them together, we gain a better understanding of the intricacies involved in designing these structures. In the end, we will also discuss interesting foundational questions about the relationship between security, consistency, and efficiency.