The SecreC programming language

ℹ️ This tutorial is adapted from the dissertation "Programming Languages for Secure Multi-party Computation Application Development" by Jaak Randmets (2017).

SecreC 2 is a programming language designed for implementing data analysis algorithms on the Sharemind platform. The main goal of the language is to make the process of implementing secure computation algorithms easy for programmers who are not familiar with secure computation or cryptography in general. We achieve this by

providing an imperative C-like syntax which is ubiquitous in data analysis literature and is generally far more familiar to programmers than other computing paradigms;
hiding that the program will be possibly run on multiple computers;
syntactically not differentiating secure computations from regular public computations, when it is reasonable to do so;
offering generic programming facilities that enable code reuse across public computations and different security schemes; and
providing a standard library that builds complex functionality on top of simple primitives, thereby allowing security schemes to only implement a small number of basic operations.

However, the language remains domain-specific. We found that general-purpose languages are not a good fit for secure computation. Firstly, in MPC the programmer has to indicate if the data is public or private and can implicitly convert data only from public to private domain but not the other way around. Having only data that is private can incur too great of a performance cost. Secondly, we facilitate intricate array processing in order to make code vectorization easier, which, in turn, makes common performance optimizations easier. Performance has dictated the language design in more than one way: we also do not support any implicit type conversions because it can be extremely costly in non-public domains. There are many more subtle differences relating to general security in programming: array accesses are always dynamically checked, global state is shunned (but still supported), pointers and references are not supported, and so on.

In the Language overview section we describe SecreC to readers who are interested in learning the language or only want a high-level overview. In the section Writing efficient SecreC we talk about how to write efficient SecreC code. Due to very large constant factors and latencies, performance is an extremely important consideration in secure computing. While we talk about the SecreC language, the techniques are generally applicable to the secure computation setting.

Hello world

The following listing is the classical "Hello, world!" program written in SecreC.

void main() {
    print("Hello, World!");
}

If it is saved as a file named hello-world.sc then it can be compiled using the SecreC compiler with the following command:

scc --input hello-world.sc --output hello-world.sb

The output file hello-world.sb is compiled bytecode which is interpreted by the virtual machine inside the Sharemind Application Server. The bytecode file has to be copied to the scripts folder of the server installation. The default path is /var/lib/sharemind/programs/. The program can then be run using:

sharemind-runscript --run hello-world.sb

Language overview

Familiar syntax

A major goal of SecreC is to offer a familiar syntax. It is infeasible to expect the users to be fluent in functional languages or complex type systems that provide strong security guarantees. For this reason SecreC has a C-like syntax which is ubiquitous amongst popular programming languages. In fact, just by looking at a complete "Hello, World!" program written in SecreC it is impossible to say that the language is somehow unique or related to MPC.

// Simple "Hello, World!" program.
void main() {
    print("Hello, World!");
}

SecreC supports regular public computations the syntax of which is very close to other C-like languages. Arithmetic, bitwise, logical and relational operations are all supported. All integer data types are strictly sized from 8 to 64 bits and include signed and unsigned variants. We also support Booleans, strings, floating-point values and arrays of primitive data types. Basic support for user-defined structures is also available. The following listing demonstrates simple arithmetics on signed 64-bit and unsigned 32-bit integers. Note that without explicit type conversion when initializing variable z the code will not compile and will raise a type error.

void main() {
    int64 x = 25;
    int64 y = - 10;
    print("x + y = ", x + y);
    print("x * y = ", x * y);
    uint32 z = (uint32) (x - (x / y));
    print("z = ", 1 + 8 * z);
}

A familiar C-like syntax for branching, looping and user-defined functions is available. Function parameters are always passed by value. This includes structures, arrays and non-public values. See the following example on defining and calling functions. Note that the 27 :: uint32 syntax is used to specify that the constant 27 is of 32-bit unsigned integer type. We could have also used C-like (uint) 27 but the former guarantees that there is no type conversion happening.

uint32 nextCollatz(uint32 n) {
    if (n % 2 == 0)
        return n / 2;
    else
        return 3 * n + 1;
}
void printCollatz(uint32 n) {
    while (n != 1) {
        print(n);
        n = nextCollatz(n);
    }
}
void main() {
    printCollatz(27 :: uint32);
}

Performing secure computations

SecreC distinguishes between public and private data on type system level. Every type in SecreC has two components: the data type and the security type. When not specified, the security type is implicitly assumed to be public. To declare private data it is not sufficient to state that the data is simply "private". The programmer has to be explicit about what kind of security scheme is used and has to specify the name of the concrete deployment of this scheme. This is because Sharemind MPC facilitates the use of many different security schemes and even the concurrent use of multiple instances of a given scheme.

Having an omitted security domain default to public is not a security issue. The type system guarantees that private data is never implicitly converted to public data. Even if the programmer accidentally leaves out a non-public domain a type error will be raised. It is possible to construct an artificial example where this is not true but it would have to involve (indirect) declassifications. Another reason for having the default domain be public is that SecreC programs contain a fair share of public code. For example loop counter and array indices are usually public.

To see this in action, consider the code in the following listing where we first declare a security scheme (or kind) called shared3p and then declare a concrete security domain of that kind. In the main function we declare two variables that are in the shared3p domain, compare them, reveal the result, and print it. This example only involves computation parties. The program does not receive input from input parties or produce results to output parties. The computing parties only log the result of declassification.

kind shared3p {
    type bool;
    type uint;
}

domain pd_shared3p shared3p;

void main () {
    pd_shared3p uint x = 3;
    pd_shared3p uint y = 5;
    public bool z = declassify(x < y);
    print("x < y = ", z);
}

Private variables are defined similarly to public ones, except the security type is written before the data type. Explicitly writing the public domain is allowed. There are a few things of note in this example: public values can be implicitly converted to private, the regular syntax for comparison can be used to invoke secure comparison, and conversion to the public domain requires an explicit call to declassify function. In order for Sharemind Application Server to be able to execute the following code, it has be configured to supply the mentioned security scheme and a concrete instance of it. The security schemes are implemented via loadable modules and concrete instances (so-called domains) have to be explicitly configured. To configure a protection domain a user has to specify the domains’ name, which concrete servers are involved, and a loadable module that associates operation names with executable code. The details of server configuration are not included in this tutorial.

Input and output

In real applications private variables are rarely initialized directly from public ones like we just saw. A more realistic program would be one that, for example, gets the value of x from one client and y from another. As a result of executing the example both input providers would be notified whether x is less than y. This can be considered an implementation of the Millionaires’ problem^[1]. SecreC 2 code for such application is presented in the following listing. Instead of declaring a protection domain kind we use appropriate standard library module that already declares that kind and defines various operators and functions for it. Inputs are no longer hard-coded and instead are given as arguments to the program. The result is published to output parties.

import shared3p;
domain pd_shared3p shared3p;
void main() {
    pd_shared3p uint x = argument("IncomeAlice");
    pd_shared3p uint y = argument("IncomeBob");
    publish("result", x < y);
}

Notice that the example still only directly involves computing parties. It does not state from where the "IncomeAlice" and "IncomeBob" arguments come from or who receives the published result. This logic is implemented outside of SecreC 2. The very same code could be used in many different applications. In one case it could be used for an application that involves three parties two of which provide input and all of them receive the output. Or it could also involve a web interface for supplying data. In that case input, computing and result parties could all be distinct. For a real-world example of how MPC can be deployed for web applications see the dissertation^[2] of Riivo Talviste.

Private conditionals

Private values can be used exactly like public ones in most places, such as, expressions, function arguments, variable declarations, and return values. However, they cannot be branched over. For example, if we want to conditionally change a secret value x depending on a private Boolean b we are not allowed to write if (b) { x = y; }. The same result can be achieved by evaluating (uint) b * (x - y) + y that results in x if b is true and in y otherwise. This pattern can be abstracted to a function.

Many MPC programming languages, for example, ObliVM^[3] and PICCO^[4], allow branching over secure Booleans in some restricted cases. We have decided to not adopt this approach mainly because we have not had the need for it. While it does help developers a little bit, in our experience development effort is not significantly increased in their absence. There are some other less important reasons for not having private conditionals.

When branching over secure Booleans, it is not possible to perform any public side effects such as printing output or performing public assignments. The type system or program analysis would need to detect such conditions. Encoding such information in types makes the language more complicated. On the other hand, achieving the same results via program analysis hides from the programmer if a function can be used in a private-conditional or not.
Secure branches are significantly slower than public ones because they force the compiler to evaluate both branches and then select the correct answer based on the private Boolean. This argument would be weak if private conditionals were indistinguishable from public ones, but they are not.
Supporting secure conditionals raises the question of why not also provide support for secure looping with an upper bound to the number of iterations.

In summary, supporting conditionals over secure Booleans adds value by helping the programmer but it also complicates the language, its type system, and compiler implementation. Having all control flow be public leads to a more approachable language and a simpler compiler. Given that private conditionals are not overly common we opted to not offer convenient syntax for them. This limitation is conservative in the sense that it can be relaxed in a backwards compatible manner if it turns out that private conditionals are frequently used.

Protection domains and kinds

We provide two definitions that explain our motivation behind the choices for the keywords for declaring a security scheme and its instances.

A protection domain kind (PDK) is a set of data representations, algorithms and protocols for computing on and storing protected data.

A protection domain (PD) is a set of data that is protected with the same resources. There is a well-defined set of algorithms and protocols for computing on that data while keeping the protection.

Each protection domain belongs to one protection domain kind and each kind may have several protection domains. A classical example of a PDK is secret sharing with an associated data representation and protocols for construction, reconstruction and arithmetic operations on shared values. A PD in this PDK would specify the actual parties doing the secret sharing. Another example of a PDK is fully homomorphic encryption with operations for encryption, decryption and algorithms for performing arithmetic operations such as addition and multiplication on encrypted values. For the FHE PDK, each PD is associated with a secret key. It is also possible to consider non-cryptographic methods for implementing PDK-s using trusted hardware or virtualization. Intel® SGX offers an example of a practical PDK with a single physical computer evaluating a program.

In general a PDK has to provide:

A list of data types supported by the PDK.
For each data type in the PDK:
- a classification function for converting public data to the protected representation and a declassification function for revealing private data,
- protocols or functions for operating on protected values.

The protocols performing secure computation should be universally composable so that they can be safely combined into programs without losing security guarantees. Here we state that any PDK needs to allow its data to be converted to and from the public domain. This is not strictly necessary and a PDK may instead allow converting data to and from some other PDK. However, we have not found a practical case where restricting public conversions is useful.

Array processing

Rarely do applications of secure computation deal with single values. More often the goal is to learn some information from a large set of data without revealing information about the individual values. Naturally, to process many records a programming language has to support arrays or some other forms of collections. As we will later see, a solid support for data-parallel processing is also an attractive feature. SecreC supports arbitrary-dimensional arrays with publicly known size. Access is provided and modifications are made via public indices.

A simple aggregation procedure is to count the number of occurrences of a value in a set of data. This can easily be achieved, as shown in the following listing, by iterating over the input array, comparing with the value, and incrementing the counter whenever a match is found. Counter increments are achieved by casting the result of the equality check to an integer and adding that to the counter. When the comparison results in true the counter is incremented by 1 and otherwise by 0. Double square brackets, such as [[1]], denote the array’s dimensionality (the number of dimensions). For instance, public integer matrices are represented with int[[2]]. We use double square brackets to be distinct from C where int[4] denotes a one-dimensional array of length four. Stacking multiple square brackets as int[][][][] would mean that special syntax would be needed for dimensionality-generic code.

pd_shared3p uint count(pd_shared3p uint32[[1]] data,
                       pd_shared3p uint32 value) {
    pd_shared3p uint matchCounter = 0;
    for (uint i = 0; i < size(data); ++ i)
        matchCounter += (uint) (data[i] == value);
    return matchCounter;
}

SecreC puts a lot of emphasis on array processing and tries to make it convenient. For example, many language constructs are lifted to operate pointwise on arrays. One can add two arrays using the exact same syntax as for adding two scalars. Binary arithmetic also allows for one operand to be an array and the other a scalar. Other than being less verbose vectorized operations are in many situations more efficient than loops of scalar operations. It is possible to rewrite the counting function in a more compact style by comparing the value to the data pointwise and summing the resulting Booleans as sum(data == value) where sum is a function defined in the standard library. We will later see that, in addition to conciseness, this version also performs significantly better than the one in the last listing.

Reshaping and moving data between arrays is very common and to avoid writing verbose loops for such a typical task we support indexing arrays with ranges. Given a one-dimensional array arr the expression arr[b:e] results in an array of length e-b where the elements are taken from the range of indices from b to e-1. If omitted the lower bound is implicitly assumed to be 0 and the upper bound is assumed to be the length of the array. This syntax expands naturally to arbitrary-dimensional arrays and can be mixed with regular indexing. For example, the expression mat[1, :] results in the second column of the matrix mat.

It is possible to change the number of dimensions and dimension sizes of an array. The only requirement is that the number of elements of the resulting array be the same as in the input array. For example, a matrix mat can be flattened into a one-dimensional array with the reshape(mat, size(mat)) expression. More generally, reshape takes arbitrary number of arguments after the first one that indicate the shape of the resulting array. SecreC guarantees that arrays are stored in generalized row-major order. When we flatten a matrix, its first row occurs first in the resulting array, then the second row, and so on. Scalars are allowed to be reshaped into any shape or size.

Protection domain polymorphism

Sharemind is not limited to using a single protection domain kind and can even support the use of multiple protection domains concurrently. This feature makes it possible to re-use code between different security schemes. For example, the function for counting occurrences in an array can be implemented for any scheme that supports comparison, addition, and conversion from Booleans to integers. However, the count implementation is not usable across different security schemes but only on one particular instance of the additive three-party scheme.

To enable reuse across security schemes, SecreC supports protection domain polymorphism using syntax similar to C++ templates. A generic version of the counting function is presented in the following listing where the function has been made polymorphic over any protection domain D but the rest of the code remains unchanged compared to the previous implementation. Note that this includes the public security domain, making the function also usable on public data. In many cases algorithms have a single implementation across multiple protection domains.

template <domain D>
D uint count(D uint32[[1]] data, D uint32 value) {
    D uint matchCounter = 0;
    for (uint i = 0; i < size(data); ++ i)
        matchCounter += (uint) (data[i] == value);
    return matchCounter;
}

Template functions are type checked similarly to C. Type correctness of a template function body is verified only when the function is instantiated to some concrete protection domain. The definition itself is only syntactically verified. Unfortunately this means that when the counting function is called on a protection domain that does not support equality, a type error is raised at the location of the comparison and not where the function is called from. Some other languages overcome a similar problem by restricting type parameters to only types that are known to offer some necessary functionality (such as equality). The restriction can be statically verified at the call site and error is raised when the constraints are not satisfied. For example, Haskell has type classes, rust has traits, and C has proposed concepts. All these solutions add considerable complexity to the language. Thus, we find it an acceptable tradeoff to not have any form of bounded quantification. SecreC programs are at most medium-sized and typically span less than a few thousand lines of code.

Often it is not sufficient to provide a single implementation of a particular function that works on all protection domains. Sometimes we want a faster implementation for a certain protection domain, or the default implementation uses operations that a domain does not support. Function overloading can typically be used to solve those cases.

Consider function choice for choosing a value based on a Boolean. It takes a Boolean value and two arguments and returns the first argument if the Boolean is true and the second argument if the Boolean is false. SecreC does not support regular if-expressions over secure conditionals; thus, this kind of function is often useful for filling the role of branching. In the following listing the first definition is polymorphic over any protection domain but the second one is restricted to a domain based on Yao’s garbled circuits scheme. The first definition can be considered a more generic default implementation and it relies on multiplication, addition and subtraction. However, for Yao’s scheme multiplication can be a rather costly operation that we want to avoid. Instead, in the case of the yao2p domain, the Boolean is converted to a suitable bitmask and the result is computed via bitwise conjunction and XOR. Note that the second implementation is also not a good default as it is not efficient for additive secret sharing based schemes. The polymorphic overload that is specialized for yao2p protection domain kind works for any concrete domain of that kind.

template <domain D>
D uint32 choice(D bool b, D uint32 x, D uint32 y) {
    return (uint32) b * (x - y) + y;
}

template <domain D : yao2p>
D uint32 choice(D bool b, D uint32 x, D uint32 y) {
    D uint32 mask = - (uint32) b;
    return mask & (x ^ y) ^ y;
}

When multiple matching definitions are found for the function being the most restricted match will be selected. In the current example, the second definition has a more restrictive signature than the first. When multiple equally matching functions are found, a type error is raised.

We have built a standard library using polymorphic functions, overloading and modules. The code is structured into a generic stdlib and specialized modules for each protection domain. The stdlib module defines functions that are polymorphic, with some overloaded on public domain. Each protection domain kind module can overload the standard library functions and provide the user with more efficient implementations. For example, we supply the shared3p module and some accompanying modules to provide more efficient operations.

In addition to protection domain polymorphism, we also allow functions to be generic over data types and array dimensionalities. For example, the function defined in the following listing reshapes an arbitrary-dimensional array of any type in any protection domain into a one-dimensional array.

template <domain D, type T, dim N>
D T[[1]] flatten(D T[[N]] arr) {
    return reshape(arr, size(arr));
}

Writing efficient SecreC

Efficiency is often a concern in secure computation. Secure algorithms, that are implemented directly based on some public version, can be infeasibly slow due to high round counts or impractical network bandwidth requirements. This is a problem as most secure computation protocols require network communication. Some security schemes require a constant number of communication rounds but high network bandwidth, and some schemes have low bandwidth requirements but need to perform many communication rounds. In either case, a programmer must take care when implementing secure algorithms.

In this section we provide some generic techniques that help improve performance of secure programs. We have the following suggestions that we will elaborate on and show how they are applied in SecreC.

Prefer parallel execution to sequential execution. Even for reasonably large inputs, applications should take a sublinear number of communication rounds.
In many cases the SIMD style of parallelism is sufficient but sometimes more involved techniques need to be applied to lower the number of communication rounds. For example, associative operations can be aggregated in a logarithmic number of rounds.
Prefer operations that are not costly in the chosen protection domain. For example, in additive schemes addition does not require network communication.

Parallel execution and SIMD

SecreC operates on the assumption that the underlying computation protocols are universally composable. This property allows secure operations to be executed either in parallel or sequentially without loss in security guarantees.

In real applications, when we execute two subsequent multiplications we invoke the underlying secure multiplication protocol twice sequentially. Regardless of how many communication rounds the underlying multiplication protocol performs, when two sequential multiplications are performed it takes about twice as much time. Compared to regular computers where operations take nanoseconds, in secure computation operations have latencies of over several milliseconds because of network communication. This is a difference in the order of \(10^6\).

Such a drastic overhead might sound like secure computing is completely infeasible. In addition to comparing only instruction latencies, we should also look at parallel execution. Namely, in a networked setting it is possible to perform thousands of operations in parallel in the same amount of time that it takes to execute a single operation. This is because the network is optimised for high-bandwidth usage. Thus, for reasonably efficient applications sequential execution should be avoided and parallel execution should be preferred.

SecreC has opted for data parallelism (performing one task on many pieces of data at the same time) over task parallelism (performing many tasks at the same time). Most operations in the language can be applied to arrays in which case the operation is executed pointwise on the data. This is also known as the single instruction, multiple data or SIMD approach.

To demonstrate the issue with high latency let us look at an implementation of dot product as usually written in a regular imperative programming language. The common solution is to iterate over both arrays adding the product of the corresponding elements to the accumulator. In most security schemes the multiplication operation requires the execution of a protocol that takes at least a single communication round. The iterative solution would then take a linear number of communication rounds with respect to input length. Due to network latencies this makes the algorithm infeasible to use in real applications with non-trivial input sizes.

template <domain D>
D uint32 dotProduct(D uint32[[1]] x, D uint32[[1]] y) {
    assert(size(x) == size(y));
    D uint32 result = 0;
    for (uint i = 0; i < size(x); ++ i)
        result += x[i] * y[i];
    return result;
}

To optimize the previous example we notice that we can multiply the input arrays pointwise. This is a data parallel operation and takes the same number of rounds as multiplication of scalar values. The result of the product is stored in a temporary array that is then summed. In the additive scheme, integer addition is a local operation; thus, the following implementation is efficient enough because iterative addition does not increase the round count of the procedure.

template <domain D>
D uint32 sum(D uint32[[1]] arr) {
    D uint32 result = 0;
    for (uint i = 0; i < size(arr); ++ i)
        result += arr[i];
    return arr;
}
template <domain D>
D uint32 dotProduct(D uint32[[1]] x, D uint32[[1]] y) {
    return sum(x * y);
}

Round-efficiency

In some schemes, such as Boolean circuit based ones, addition is not a local operation, meaning that sequential addition still incurs a linear round count. This shortcoming can be overcome by iteratively performing data-parallel additions between the lower and upper half of the array until a singleton array remains. For an input of length \(n\) this procedure takes \(O(\log n)\) communication rounds. A possible implementation that also takes care of non-power-of-two inputs is provided in the following listing. We could use round-efficient summation as the default implementation but it turns out to be slower in the case of public data and security schemes where addition is a local operation.

template <domain D : yao2p>
D uint32 sum(D uint32[[1]] arr) {
    uint n = size(arr);
    if (n == 0) return 0;
    while (n > 1) {
        uint smallHalf = n / 2; // Rounds down
        uint bigHalf = n - smallHalf;
        arr[: smallHalf] += arr[bigHalf : n];
        n = bigHalf;
    }
    return arr[0];
}

This technique is generic and is applicable to any associative binary operation. For example, secure floating-point addition as implemented by Kamm and Willemson^[5] performs multiple communication rounds. Summing a sequence of floating-point values is a common operation that is often performance-critical. While floating-point addition is not strictly associative it is regardless very useful to apply this technique for performance reasons.

Security scheme specific techniques

Parallelism and round-reduction techniques are useful tools in general but especially important in a secure setting because of the very high constant factors involved. In many cases it is also possible to exploit the properties of the security scheme to provide even better performance. We shall see few of these techniques here and take a look how they can be implemented in SecreC.

Consider the task to check if at least one value in a Boolean array is true. An implementation that uses a parallel algorithm to compute the disjunction of all elements takes \(O(\log n)\) rounds for an \(n\)-element input array. In practice this is a perfectly reasonable implementation but it is possible to do better. Asymptotically improved round complexity is achieved by converting the Boolean array to additively shared integers, adding up the integers, and checking if the sum is zero. This exploits the fact that addition of additively shared integers is a local operation. In order to not overflow the sum we must use at least \(\lceil \log n \rceil\)-bit integers. Because conversion from Boolean to integer takes a constant number of rounds and integer comparison has logarithmic round complexity, this yields an algorithm with \(O(\log \log n)\) round complexity. Our implementation in the following listing uses 64-bit integers which is sufficient for all practical purposes.

template <domain D : shared3p>
D bool any(D bool[[1]] arr) {
    return sum((uint64) arr) != 0;
}

The Performance of shared3p protocols section in the SecreC reference contains performance measurements of shared3p protocols. This information can be useful for optimising functions for shared3p inputs.

It is difficult to adopt many common algorithms to the secure computation setting. For example, most popular sorting algorithms branch their control flow depending on the results of comparisons between the elements of the input. These sorting algorithms cannot be adopted to secure computation setting in a straightforward manner because such control flow dependencies leak information, namely the ranks of the elements and, thus, the relative ordering of secure inputs. However, if the original ordering of the data is hidden by randomly shuffling (permuting) the input then the control flow does not reveal as much information. In fact, if all input values are unique then no information is revealed^[6]^[7].

A useful and efficient technique available for many different secret sharing schemes is to randomly shuffle the input to hide data dependencies. The random shuffling scheme proposed by Laur, Willemson and Zhang^[8] is constant in round complexity and \(O(n\log n)\) in communication complexity. Thus, this primitive facilitates the implementation of \(O(n \log n)\) sorting algorithms. Many practical data-oblivious sorting algorithms exist — bitonic sorter^[9], Batcher odd–even mergesort^[9], pairwise sorting network^[10] — but they all have \(O(n \log^2 n)\) time complexity, take \(O(\log^2 n)\) rounds, and are not stable. Numerous applications are built on top of oblivious sorting^[11]. The shuffling primitive has also found use in oblivious database linking^[12].

Trade-off between efficiency and privacy

When writing MPC code there is almost always a trade-off between efficiency and security. An implementation that does not reveal any information can be impractically slow. By revealing some sensitive information that is found to be acceptable to leak we can speed up the program significantly. One such example is sorting where the best practical oblivious algorithms take \(O(n \log^2 n)\) time. When it is acceptable to reveal the results of comparisons it is straightforward to adopt any \(O(n \log n)\) sorting algorithm to secure computation setting^[7]^[6]. We shall see how SecreC gives the programmer flexibility to make trade-offs between efficiency and privacy.

As a small case study we will implement quickselect^[13] (also known as Hoare’s selection algorithm) for finding the \(k\)-th smallest element of an unsorted sequence. On average the public algorithm runs in linear time. In this section we show how a naive secure implementation of the algorithm has \(O(n^2)\) running time but by revealing a little information about the input sequence we can achieve \(O(n \log n)\) average-case complexity, and by revealing even more we get an implementation that works in linear time.

Public quickselect operates similarly to quicksort. The algorithm operates iteratively on a working list. In every iteration we pick a pivot from the list and split the remaining list into two parts: elements smaller than the pivot, and the rest of the elements. Let the list wih smaller elements have \(l\) elements. If \(k = l\) then we are done as the pivot is the \(k\)-th smallest element (we start counting from \(0\)). If \(k < l\) then the answer can be found in the list with smaller entries, and we just drop the larger elements and continue. If \(k > l\) then the element can be found in the list with larger entries. In this case we drop the smaller elements and continue to look for the \((k - l - 1)\)-th smallest element in the remaining sequence.

Usually, the development of a privacy-preserving algorithm starts from a public implementation or description. In the following listing we have implemented quickselect in SecreC so that it only operates on public data. It mostly follows the algorithm description from earlier but we have already given some forethought to the secure implementation. Namely, we perform all comparisons between the pivot and the rest of the list in parallel to keep the round count minimal. The results of the comparison are stored in the mask array and variable \(l\) is computed by summing the bitvector. If \(k > l\) we flip the mask. After the checks we filter the list according to the mask. To keep the presentation simple, we have not optimized it. Namely, the algorithm does not operate in-place and traverses the sequence more often than is strictly needed.

int quickselect(int[[1]] list, uint k) {
    assert(k < size(list));
    int pivot;
    while (true) {
        pivot = list[0]; // Random selection is better
        list = list[1:];
        bool[[1]] mask = list < pivot;
        uint l = sum(mask);
        if (k == l) break;
        if (k > l) mask = ~mask;
        if (k > l) k = k - l - 1;
        list = filter(list, mask);
    }
    return pivot;
}

Given an \(n\)-element input list, in the worst case our implementation may take \(n\) iterations leading to an \(O(n^2)\) algorithm. Consider the case where the list is already sorted and we are looking for the \((n-1)\)-th smallest element. In such case, during every iteration, the pivot is chosen to be the first element of the running sequence. Because it is the smallest one, no elements other than the pivot are eliminated from consideration. We can improve the algorithm by selecting the pivot randomly but this does not improve the worst case as every time the smallest element could be selected. A perfectly secure version has to account for the worst-case possibility. Thus, if we do not want to leak any information we have to perform all \(n\) iterations of the algorithm. But such an implementation is unacceptably slow and better ones clearly exist. For example, we could simply sort the input sequence using some oblivious sorting network and pick the \(k\)-th element of the result. That yields a much better \(O(n \log^2 n)\)-time algorithm.

If we are willing to accept leaking the number of iterations that the algorithm performs, we can do much better. Unfortunately, this may actually give away a lot about the original data. For example, when we select the first element as the pivot during each iteration and the algorithm performs \(n\) iterations then we learn that the input list is sorted. Randomized pivot selection makes such a leak unlikely but does not eliminate it.

template <domain D>
D int quickselectPrivate(D int[[1]] list, uint k) {
    assert(k < size(list));
    D int pivot;
    while (true) {
        pivot = list[0];
        list = list[1:];
        D bool[[1]] mask = list < pivot;
        D uint l = sum(mask);
        if (declassify(k == l)) break;
        mask = choose(k > l, ~mask, mask);
        k = choose(k > l, k - l - 1, k);
        list = partition(list, mask);
    }
    return pivot;
}

The secure implementation follows quite straightforwardly from the public one. To stop iterating, we need to declassify the results of comparison \(k = l\). We have to update variables mask and k obliviously. Updating the working list, however, is tricky. Namely, we cannot simply obliviously remove unnecessary elements as this will give away more information than we would like. One possible solution is to use an additional bit vector indicating if an element in the list is actually there or not. This would complicate the implementation. A simpler solution is to replace elements that are not needed with some reserved values that are guaranteed to be larger than all others, and move them to the end of the sequence.

Secure partitioning is performed in the following way. First we replace the elements of the sequence corresponding to false values in the mask by maximum integer values, and then we sort the sequence by the Boolean mask moving true values to the front. This can be done similarly to the Boolean counting sort step of the radix sort in ^[2]. Sorting Boolean vectors can be done in effectively linear time and; hence, the partitioning is also linear-time and takes a constant number of communication rounds. Given that, we can say that on average the algorithm takes \(O(n \log n)\) time and \(O(\log n)\) communication rounds. This is already as good as sorting-based solutions.

To go even further, if we are willing to leak on every iteration the number of elements smaller than the pivot, we can achieve linear-time solution. When declassifying \(l\) we can, instead of partitioning, remove the unneeded elements; thus, reducing the size of the working list. To improve security we can pick the pivot securely. This can be done by securely shuffling the working list at the start of every iteration and continuing as before.

Remove redundant computations

Any arithmetic operation on private values (except integer addition) takes a significant amount of time. You may be used to being able to copy and paste subexpressions in standard programming languages because the amount of computation time spent on the expression is insignificant. This is often not the case in SecreC. Store the result of the expression in a variable if you wish to reuse it.

For example, a value is often squared like this:

pd_shared3p float64 a = ...;
pd_shared3p float64 b = ...;
pd_shared3p float64 c = (a + b) * (a + b);

Instead you should create a temporary variable for the intermediate value:

pd_shared3p float64 a = ...;
pd_shared3p float64 b = ...;
pd_shared3p float64 sum = a + b;
pd_shared3p float64 c = sum * sum;

If you use a loop in your program then move every computation that does not depend on the loop outside of the loop. Evaluating an expression 100 times which can be evaluated once is very time consuming.

Avoid floating point computations

Let us take a look at the performance table of Sharemind MPC operations. We can notice that if we multiply 10000-element int64 vectors we can do 11700 operations per second. If we use float64 datatype we can only do 121 operations per second. This is a difference of two orders of magnitude. If the inputs to your program are integers keep them as integers for as long as possible.

Use datatypes with smaller bit width

Let us take a look at the operations performance table again. Multiplying 10000-element uint64 vectors is over two times slower than multiplying uint8 vectors. Prefer smaller types when possible but keep in mind that arithmetic with small types can lead to overflow.

Use xor-shared integers for comparison operations

Let us take a look at the operations performance table again. Comparing 10000-element uint64 vectors is almost three times slower than comparing xor_uint64 vectors. If you have an algorithm which does a lot of comparisons it might be beneficial to convert to a xor-shared type and then back to an additively shared type after comparisons. This strategy is used by sorting procedures in the SecreC standard library.

Reduce the number of syscalls

SecreC programs use procedures in C modules. Calling a C procedure is called a syscall or system call in Sharemind parlance. Any operation on a private value causes a system call. For example, these operations cause syscalls:

initialising a private vector;
reading an element from a private vector;
writing an element to a private vector;
any arithmetic operation on private vectors.

The syscall interface is not very efficient so it can be beneficial to avoid performing syscalls. For example, if you copy elements from one vector to another in a loop you might perform a lot of syscalls.

To optimise this particular scenario we have gather and scatter syscalls which take a private vector and a public index vector and perform multiple reads/stores in parallel. Constructing the public index vector does not require syscalls.

The following is an example of how to use the gather syscall.

__syscall("shared3p::gather_uint8_vec",
    __domainid(pd_shared3p), source, target, __cref idx);

In this case we read from a uint8 vector. The first argument specifies the protection domain, source is the vector we are reading from, target is the vector we are storing the read values into and idx contains the (uint64) indices of the positions we want to read.

The following is an example of how to use the scatter syscall.

__syscall("shared3p::scatter_uint8_vec",
    __domainid(pd_shared3p), source, target, __cref idx);

Do not use slice syntax with arithmetic assignment on private values

The following statement combines slice syntax with arithmetic assignment:

a[0:10] *= 3;

Semantically it multiplies the first ten values of a with three. With gather and scatter syscalls it is possible to evaluate this statement using O(1) syscalls and one multiplication but the compiler generates code which loops over the slice indices and thus causes O(n) syscalls and ten multiplications.

The following snippet does only one multiplication:

a[0:10] = a[0:10] * 3;

It still does O(n) syscalls. Let us assume that a has type int64. We can optimise it further like this:

uint[[1]] idx(10);
for (uint i = 0; i < 10; ++i) {
    idx[i] = i;
}
pd_shared3p int64[[1]] tmp(10);
__syscall("shared3p::gather_int64_vec",
    __domainid(pd_shared3p), a, tmp, __cref idx);
tmp = tmp * 3;
__syscall("shared3p::scatter_int64_vec",
    __domainid(pd_shared3p), tmp, a, __cref idx);

This kind of statement is not used very often but we point it out because it can be difficult to understand why your program is slow if you have written such a statement.

Implementation and integration with Sharemind MPC

In this section we give a short overview of the SecreC compiler and its integration with the Sharemind framework. We will also briefly discuss the bytecode to which the language compiles.

Custom bytecode

Sharemind MPC is able to execute custom bytecode. We will not go over the details of the bytecode but we will elaborate on why a custom bytecode language is used. The bytecode specification is available online.

We use custom bytecode for a few reasons. The most important one is security. By sacrificing some performance, we can have a more secure implementation where invalid operations, like division by zero, can gracefully stop the execution of the code instead of crashing the entire platform. Many bytecode languages have access to foreign functions and raw memory but this is something we wish to avoid for the sake of security. Custom bytecode allows for fine-grained control of memory. A program can be stopped by the VM when it goes over some memory allocation limit. Because we avoid garbage collection, the programmer can better predict memory usage.

The secondary reason for having custom bytecode is portability. Namely, different architectures implement floating-point operations in a different manner. For example, machine instructions for computing sine might give one result on one machine but a slightly different result on another. In such case the control flow of a program executing on these machines could dffer. Many existing bytecode languages rely on floating-point operations and therefore not suitable for distributed execution necessary for MPC. Sharemind’s bytecode relies on a software implementation of floating-point operations. Currently, we use the SoftFloat library (Release 2c).

Integration with Sharemind MPC

The SecreC language is not tightly coupled to Sharemind MPC. The compiler is just a standalone tool that takes a SecreC program text and produces a bytecode executable. The bytecode, in turn, can be executed using Sharemind. Conceptually, the bytecode can be generated from some other language or can even be manually written.

The SecreC compiler does not have to be deployed together with Sharemind but it usually is. This is because Sharemind is still vulnerable to executing untrusted code and, therefore, before executing any code it must be reviewed. Bytecode is much more difficult to audit than high-level SecreC programs. Hence, system administrators usually compile the SecreC file themselves after auditing the code, and then manually deploy the compiled bytecode file. The SecreC compiler is open-sourced and free to be verified by anyone.

A standard Sharemind deployment scheme is given in the following figure. The figure depicts how a SecreC script app.sc is compiled and run on a Sharemind Application Server instance. For each of the three parties, the script is compiled to a bytecode file app.sb. When instructed, the Sharemind instance can then execute the deployed script. Before execution, Sharemind Application Server checks that the bytecode files are identical.

Figure 1. SecreC integration with a Sharemind Application Server instance running on three parties \(P_1\), \(P_2\) and \(P_3\).

The bytecode contains the list of protection domains that it expects to use, and also the list of system calls that it may perform. Before execution, Sharemind verifies if such protection domains have been configured and checks that all system calls are supported. After verification, Sharemind instantiates the required protection domains. This can include setting up various facilities such as random number generators, local overlay networks, database engines, and logging back-ends. If everything goes well, the program is executed after the setup phase.

The individual bytecode instructions do not include network communication, random number generation or any other security primitives. All of these tasks are performed via system calls to Sharemind. When the bytecode invokes a system call, control is given to the protection domain that implements it. Generally, a system call first verifies that its parameters (e.g., number of arguments, input lengths, input types) are correct, and then executes the actual logic. For example, the system call could perform the additive three-party multiplication protocol. To execute that protocol, protection domains need to generate random numbers and perform network communication between each other.

Using the table database interface

SecreC supports a basic file-based table database system intended for storing and organizing large amounts of data to be privately processed in SecreC.

Before creating a table a connection must be opened to the data source with the function tdbOpenConnection. The name of the data source is set in Sharemind’s configuration. Before opening a new connection to a different data source make sure the old connection has been closed with tdbCloseConnection. A simple table with a uniform data type can be created with the function tdbTableCreate.

Creating an empty table:

// creating an empty table with 3 columns of type int32 and protection domain shared3p

import shared3p;
import shared3p_table_database;
import stdlib;
import table_database;
domain pd_shared3p shared3p;

void main () {
    string datasource = "DS1";
    string table_name = "SimpleTable";
    tdbOpenConnection (datasource);
    pd_shared3p int32 data_type;
    tdbTableCreate (datasource, table_name, data_type);
}

Data is added to the table one row at a time with the function tdbInsertRow. Each row is a vector with every element corresponding to a column in the table. Data is read from the table one column at a time with the function tdbReadColumn. The identifier of the column can be the column’s index or the name of the column.

Vector map

A vector map (also referred to as a value map or a vmap) is a data structure for making more complicated tables with column identifiers and multiple data types. A vmap can contain four different types of information: types, strings, values and indexes. A vmap is similar to a Python dictionary as the vmap contains parameters and data associated to those parameters.

Values added to the vmap can be either fixed lenght or variable length. Variable length means that the size of values must not be uniform in one column e.g. strings or vectors with variable length. Values are stored in the vmap as batches. Each batch covers all columns but one row. Data can be added one entry at a time with tdbVmapAdd{Type/String/Value/Index} for fixed length data or with tdbVmapAddVlen{Type/Value} for variable length data.

Creating a table from a vector map

When creating a table instead of specifing a data type and the number of columns a vmap can be used. The vmap must contain a type and string for every column with the parameters "types" and "names" respectively. Data can be inserted to the table with tdbInsertRow but instead of a vector a vmap can be used. The vmap must have values with the parameter "values" that are the same type as their respective column in the table. Every batch in the vmap corresponds to a single row in the table.

Creating a table with a vector map:

import shared3p;
import shared3p_table_database;
import stdlib;
import table_database;
domain pd_shared3p shared3p;
void main() {
    string ds = "DS1"; // Data source name
    string tbl = "table"; // Table name
    // Open database before running operations on it
    tdbOpenConnection(ds);
    // Check if a table exists
    if (tdbTableExists(ds, tbl)) {
        // Delete existing table
        tdbTableDelete(ds, tbl);
    }
    // We want to create a simple table with three columns:
    //
    //       -----------------------------------------------------------
    // Name: |       "index" |      "measurement" | "have_measurement" |
    // Type: | public uint64 | pd_shared3p uint64 |   pd_shared3p bool |
    //       -----------------------------------------------------------
    //       |             0 |                  0 |               true |
    //       |             1 |                 10 |               true |
    //       |             2 |                 20 |               true |
    //       |             3 |                 30 |               true |
    //       |             4 |                 40 |               true |
    //       -----------------------------------------------------------
    // Create a new "vector map/value map" for storing arguments to the table
    // creation call.
    uint params = tdbVmapNew();
    // Column 0, name "index", type public uint64
    {
        uint64 vtype;
        tdbVmapAddType(params, "types", vtype);
        tdbVmapAddString(params, "names", "index");
    }
    // Column 1, name "measurement", type pd_shared3p uint64
    {
        pd_shared3p uint64 vtype;
        tdbVmapAddType(params, "types", vtype);
        tdbVmapAddString(params, "names", "measurement");
    }
    // Column 2, name "have_measurement", type pd_shared3p bool
    {
        pd_shared3p bool vtype;
        tdbVmapAddType(params, "types", vtype);
        tdbVmapAddString(params, "names", "have_measurement");
    }
    // Create the table
    tdbTableCreate(ds, tbl, params);
    // Free the parameter map
    tdbVmapDelete(params);
    // Insert some data
    uint nrows = 5;
    params = tdbVmapNew();
    for (uint i = 0; i < nrows; ++i) {
        uint64 index = i;
        pd_shared3p uint64 measurement = i * 10;
        pd_shared3p bool have_measurement = true;
        if (i != 0) {
            // This has to be called in-between rows
            tdbVmapAddBatch(params);
        }
        tdbVmapAddValue(params, "values", index);
        tdbVmapAddValue(params, "values", measurement);
        tdbVmapAddValue(params, "values", have_measurement);
    }
    tdbInsertRow(ds, tbl, params);
}

Erasing tables

Before creating a new table it is recommended to check if a table with the same name already exists and delete it if it does. A table can be deleted with tdbTableDelete. After the program finishes SecreC automatically deletes all existing vector maps but for safety concerns it is advised to do this manually.

References

1. A. C.-C. Yao, "Protocols for secure computations (extended abstract)," in FOCS, 1982, pp. 160–164.

2. R. Talviste, "Applying secure multi-party computation in practice," PhD thesis, University of Tartu, 2016.

3. C. Liu, X. S. Wang, K. Nayak, Y. Huang, and E. Shi, "ObliVM: A programming framework for secure computation," in 2015 ieee symposium on security and privacy, 2015, pp. 359–376.

4. Y. Zhang, A. Steele, and M. Blanton, "PICCO: a general-purpose compiler for private distributed computation," in 2013 ACM SIGSAC conference on computer and communications security, ccs’13, berlin, germany, november 4-8, 2013, 2013, pp. 813–826.

5. L. Kamm and J. Willemson, "Secure Floating Point Arithmetic and Private Satellite Collision Analysis," International Journal of Information Security, pp. 1–18, 2014.

6. K. Hamada, R. Kikuchi, D. Ikarashi, K. Chida, and K. Takahashi, "Practically efficient multi-party sorting protocols from comparison sort algorithms," in Information security and cryptology - ICISC 2012 - 15th international conference, seoul, korea, november 28-30, 2012, revised selected papers, 2012, pp. 202–216.

7. D. Bogdanov, S. Laur, and R. Talviste, "A Practical Analysis of Oblivious Sorting Algorithms for Secure Multi-party Computation,"" in Proceedings of the 19th Nordic Conference on Secure IT Systems, NordSec 2014, vol. 8788, Springer, 2014, pp. 59–74.

8. S. Laur, J. Willemson, and B. Zhang, "Round-Efficient Oblivious Database Manipulation," in Proceedings of the 14th international conference on information security. ISC’11, 2011, pp. 262–277.

9. K. E. Batcher, "Sorting networks and their applications," in Proceedings of the april 30–may 2, 1968, spring joint computer conference, 1968, pp. 307–314.

10. I. Parberry, "The pairwise sorting network," Parallel Processing Letters, vol. 2, no. 02n03, pp. 205–211, 1992.

11. P. Laud, "Parallel oblivious array access for secure multiparty computation and privacy-preserving minimum spanning trees," PoPETs, vol. 2015, no. 2, pp. 188–205, 2015.

12. S. Laur, R. Talviste, and J. Willemson, "From Oblivious AES to Efficient and Secure Database Join in the Multiparty Setting," in Applied cryptography and network security, vol. 7954, Springer, 2013, pp. 84–101.

13. C. A. Hoare, "Algorithm 65: Find," Communications of the ACM, vol. 4, no. 7, pp. 321–322, 1961.