Partially Encrypted Database

1. Introduction

1.1. Purpose

This page shows how to integrate Sharemind HI into your service to implement the Partially Encrypted Database architecture^[1], which is best understood as a Sharemind HI for Web Services architecture. It is a detailed description meant for technical team leads to evaluate the integration process, as well as developers as a reference and documentation. This guide is accompanied by:

source code of a demo project
the Sharemind HI development bundle (including the web client library in TypeScript)
dockerfiles which allow you to run the demo project

This guide is split into two logical parts:

Section 4, “Overview Of Changes Per Use Case”: Use case diagrams which show the action flow through the system for a given use case. It references the files in the demo project.
Section 5, “Overview Of Changes Per Components”: A more detailed description of all the necessary changes in each component of the system with simplified code snippets. The goal of this section is to provide a list of specific changes per component.

Since there are many bits and pieces, Section 2, “Step-by-Step First Integration Milestone” describes how you can get started with the integration in practice.

1.2. Vocabulary

Dataflow Configuration (DFC): A configuration file which determines a list of stakeholders, tasks and topics, and permissions.
Stakeholder: A human whose certificate is listed in the DFC. Can have different roles, maybe able to sign additional end-user certificates.
End-User: A human who wants to write or read sensitive data. They use a certificate which is signed by a stakeholder, but this certificate is not listed in the DFC itself.
CSR: Certificate Signing Request.
Task Enclaves: Custom enclaves that implement privacy preserving workflows.
Topics: Named and access controlled data stores in Sharemind HI

1.3. Data model

In the code examples we refer to the following example data model:

The data model consists of three tables (users, table1, and table2). The users table holds the permissions of a single end-user, and is the only table required by the PEDB architecture. The tables table1 and table2 represent business logic tables, containing some sensitive data that needs to be protected.

2. Step-by-Step First Integration Milestone

This section describes one way how to start with the integration to get you started. This guide contains a lot of descriptions and it might be otherwise hard to know where to start.

Note: Although the integration happens in your own system (user application, web service, database), we recommend to use the Dockerfile.tasks of the demo project with docker to run the Sharemind HI Server and the task enclaves. This way you can concentrate on the code itself in the beginning, instead of setting up the Sharemind HI Server, its configuration and the task enclaves.

Run the demo application with the provided scripts and make sure that it works. Contact us if need help with resolving the issues.
When the demo works, then you can implement experiments directly in the demo code and see what happens.
Try to run the Dockerfile.tasks standalone (the build and run scripts contain the relevant docker commands), such that you can use the dockerized Sharemind HI Server for development.
Add a new (dummy) table, and implement the read and write user stories (Section 4.5 and Section 4.6).
This allows you to familiarize with the core functionality of the system, and you will then know all the places which need to be modified when you want to implement other use cases.
1. Define the structure of the new table and add it to your database (see Section 5.1.1). Try to add at least one column for sensitive information (private column) and one column for non-sensitive information (public column).
2. Modify the writeEnclave.cpp file run function of the demo project (see Section 5.2.2). Define the name of the route (e.g. the name of the new table), define the columns and table schema, and call the reencrypt function.
3. Modify the readEnclave.cpp file run function of the demo project (see Section 5.2.3), similar to your changes to the writeEnclave.cpp file run function.
4. In the web service:
  1. Integrate and configure the Sharemind HI client library (see Section 5.6.1 and Section 5.6.2)
  2. Create the forwarding function (see Section 5.6.7)
  3. Create the endpoint for writing data (see Section 5.6.4)
  4. Create the endpoint for reading data (see Section 5.6.5)
5. In the client application:
  1. Integrate and configure the Sharemind HI client library (see Section 5.3.1 and Section 5.3.2). During development, use the following private key and certificate files:
    
    lib/cmake/sharemind-hi/task-enclave-project-default-files/ca_stakeholders/end-entity-user-1-1.key
    
    lib/cmake/sharemind-hi/task-enclave-project-default-files/ca_stakeholders/end-entity-user-1-1.crt
  2. Implement uploading of the short-term keys (see Section 5.3.4)
  3. Implement the encryption and sending of new data (see Section 5.3.5)

At this point you should be able to send data to the server, see that they are stored as encrypted blobs in the database, and read out the sensitive data on the client side again. The major parts of the required code changes and the core functionality should be more familiar.

As next steps, you could setup Sharemind HI outside of the docker container to understand how configuration works, or implement other workflows.

3. Cryptographic Principles

This section lays out the cryptographic principles which are used for value encryption and access control.

3.1. Value Encryption Mechanism

Each confidential field in a row is encrypted separately. Confidential fields are encrypted with AES-128-GCM^[2]. AES-GCM is an encryption algorithm which provides confidentiality and authentication. The encryption operation produces a public tag, which is used during the decryption operation to detect any tampering of the ciphertext. In addition, one can supply additional authenticated data (AAD), which itself is not encrypted, but whose integrity is also protected by the tag. AES-128-GCM additionally uses an initialization vector (IV) which shall never be reused with the same encryption key.

iv = random_96_bits()
(ciphertext, tag) = encrypt(key, iv, plaintext, AAD)
plaintext = decrypt(key, iv, ciphertext, tag, AAD)

The AAD is used for access control and row-level integrity. The following values should be part of each AAD:

Table name: To prevent reusing a field in a different table.
Primary key, or other public fields of the same row: To prevent reusing a field in a different row of the same table. Don’t use values of public columns which are expected to change, as otherwise the AAD cannot be reconstructed when the original value is overwritten.
Column name: To prevent reusing a field in a different column of the same row.
Department/Group/User/ … identifier: Additional row-based access control in the form of key=value pairs. Examples:

Rows only to be accessed by the same user who created it, could use the AAD value user=<hex:uuid>.
Rows only to be accessed by a predefined group of users, could use the AAD value groupname=<hex:uuid>.

For regular re-encryption queries from users, this access control information comes out of the signed permission string of the user. For batch processing workflows this information can be supplied to the task enclave as a column of the input table. As a current limitation this column needs to contain fixed size values, hence the suggestion to use UUIDs in the examples above.

Additional notes:

Make sure to not use mutable data in the AAD (e.g. values of public fields which can be updated). If the original AAD cannot be reconstructed, the enclave refuses to decrypt the field. If you still need to use mutable values in the AAD, you need to reencrypt the field with the new AAD.
Each field is protected independently from other protected fields. This is done to allow schema evolution, i.e. adding or deleting columns in the future. If all confidential fields were in a single protected BLOB, modifying the schema would require re-encrypting all rows, which could negatively impact the rest of the system (system load, blocking table write access, …).
Ciphertexts of the same column have all the same, fixed size. Variable length ciphertexts could leak information about their content purely based on their length. Additionally, the current task enclave example code assumes fixed size rows/columns, as this makes the enclave code easier. But variable length columns are technically possible, too.
The ciphertext is prepended by a 36 B metadata header. This size is specific to the use of AES-128-GCM. Its content is shown below. If you decide to use a different AEAD algorithm you might need to modify the header.

struct header_t {
    /**
     * The Data ID of the encryption key.
     * The enclave knows in which topic
     * the key is in.
     */
    uint64_t key_data_id;

    /**
     * The initialization vector.
     * No pair of (iv, key) shall be reused,
     * as this would break AES-GCM.
     */
    uint8_t iv[12];

    /** AES-GCM tag. */
    uint8_t tag[16];

    // Note: The AAD is not part of the header.
    // It needs to be reconstructed from auxiliary
    // data and contextual knowledge.
};

An example of how a database table needs to be changed is in Section 5.1.1.

3.2. Access Control Mechanism

Explained in Section 5.4.4.

4. Overview Of Changes Per Use Case

This section describes in more details how the different use cases work across different actors and systems, to the detail which is relevant to Sharemind HI. Each use case is explained with a sequence diagram which contains references to the code of the demo project. Code snippets themselves are not part of this section, but of Section 5.

4.1. Stakeholder & Solution Lifecycle

Stakeholders in this context are:

The Coordinator: manages the interactions between the stakeholders and end-users.
The Enforcers: who approve that the Sharemind HI Server is setup correctly. The two administrators need to trust them, or take in the role of the enforcers themselves.
Two administrators: who manage the end-users of the system. They are CA Stakeholders in Sharemind HI, which means that they are allowed to dynamically add new end-users to a running Sharemind HI solution without modifying the dataflow configuration. They also manage the permissions of the end-users using the Four Eyes Principle. They don’t need to ever interact with the Sharemind HI Server directly.
A dummy account for the web server: which invokes task enclaves. This is the only “stakeholder” who invokes task enclaves (end-users do not invoke task enclaves by themselves).

Enforcers can use the Sharemind HI Client CLI application directly from Cybernetica, but could also use a solution which is provided by the Coordinator. However, the enforcers should be able to audit the application which they use. The end-users probably receive customized software for the given service through an existing distribution channel.

4.2. End-User Lifecycle

End-users are the service users which create and read sensitive data. The administrators manage the end-users by two processes:

Creating X509 Credentials: which are required to access the Sharemind HI Server (authorization). Since a user cannot do anything without additional permissions, this process is managed by a single administrator.
Signing a set of permissions: which is required to access sensitive data from the database (authorization). This grants users access to the sensitive data and thus is protected by the Four Eyes Principle.

The private key UPk and X509 certificate UC are required for the interaction with Sharemind HI. However, they could be hidden within the user device, or passphrase protected and stored in an external service^[3] if the user switches the platform.

When an end-user loses access to the UPk and UC, they can go through this process anew and regain access to all the data.

4.3. Long-Term Storage Key Management

The long-term storage keys are stored in the storage_keys topic in the Sharemind HI Server. The storage key itself is secret and thus needs to be created and written into the storage_keys topic by a task enclave, the keygen_enclave. From time to time a new storage key should to be created to encrypt new records, which can be achieved e.g. by invoking the keygen_enclave by a timer within the web service or by a cron job. Older keys are needed to decrypt existing encrypted sensitive data from the database.

Note: The storage_keys topic needs to use the trigger_state_save_on_upload option in the DFC, such that the meta data of a newly created storage key is safely persisted before it is used for encryption. Otherwise a power failure might lead to a loss of the new storage key, and thus would result in data loss of sensitive data which was encrypted with the lost storage key.

The life cycle of each individual storage key. Since the topic uses the `trigger_state_save_on_upload` option, the creation of the key and persisting of its metadata to the disk is atomic from the outside.

Figure 1. The life cycle of each individual storage key. Since the topic uses the trigger_state_save_on_upload option, the creation of the key and persisting of its metadata to the disk is atomic from the outside.

Figure 2. Only the last storage key is used for encryption. But all keys are kept around to allow decrypting the existing encrypted fields.

4.4. End-User Short-Term Key Management

End-users upload a write key and read key once per session to the read-key and write-key topics in the Sharemind HI Server. These keys are used for the Write Query (Section 4.5) and Read Query (Section 4.6), i.e. for uploading and downloading sensitive data. Except for uploading the keys to Sharemind HI, these keys shall not leave the client application. Otherwise sensitive data might be leaked.

This is the only situation when end-users communicate directly with the Sharemind HI Server. Depending on the infrastructure at hand, the communication between the Sharemind HI on the end-user device and the Sharemind HI Server can be realized in two ways:

Direct Communication: May be used if the gRPC(-web) endpoint of the Sharemind HI Server (or a gRPC-web proxy) is exposed to the application on the end-user device. The Sharemind HI Client on the end-user device talks directly to that endpoint. This is shown in the first sequence diagram.
Passthrough Communication: Needs to be used if the gRPC(-web) endpoint of the Sharemind HI Server (or a gRPC-web proxy) is not exposed to the application on the end-user device, but only accessible to the web service. The messages are tunneled through the existing communication channel and then moved back onto the gRPC communication channel between the server-side Sharemind HI Client and the Sharemind HI Server. The special TunneledChannel and TunneledClient modes for client libraries are used within this use case. This is shown in the second sequence diagram.

Note: If the web service uses the Sharemind HI TypeScript client library, an additional proxy server is required to translate between the gRPC-web and gRPC protocols.

4.5. Write Query

Confidential data moves from the end-user device to the database in the following way:

Encrypt the sensitive data on the end-user device.
The web service forwards the encrypted data to the task enclave.
The task enclave decrypts the data with the specified write key (as a public task argument - the key ID information in the EncryptionHeader of the protected fields is ignored).
The task enclave encrypts the decrypted data with the latest long-term storage key.
The web service stores the re-encrypted data in the database.

As a result, the sensitive data is persisted in the database and protected with a long-term storage key. Only the end-user and a task enclave saw the sensitive data.

4.6. Read Query

Confidential data moves from the database to the end-user device in the following way:

An end-user requests sensitive data from the web service.
The web service reads the requested encrypted data from the database and forwards it to the task enclave.
The task enclave decrypts the data with a long-term storage key (which is specified in the EncryptionHeader of the protected fields)
The task enclave encrypts the decrypted data with the short-term read key of the end-user (which was specified as a task argument).
The web service sends the re-encrypted data to the end-user device.

As a result, the sensitive data is accessible by the end-user. Only the end-user and a task enclave saw the sensitive data.

4.7. Advanced - Batch processing

In addition to simple read and write operations, it is possible to processes the sensitive data inside the task enclaves in a way that no person sees the sensitive values. For example one could aggregate and summarize the sensitive information to generate reports and dashboards, or train machine learning models based on sensitive information. The flow is similar to the read query, with an added batch processing operation and an additional integrity (hash) check to enable user provided inputs for the processing.

Note: The hash is needed to ensure that the user provided input arguments for processing have not been manipulated.

Additionally it is possible to provide freshness guarantees, by including a nonce with the input parameters, and to provide sensitive paramaters, by adding elements of the write query into the flow. However these are not demonstrated here.

5. Overview Of Changes Per Components

This sections concentrates on the code changes of a single system. The goal is to provide you a compact overview of all the changes which need to be done in a given component.

5.1. Database side:

5.1.1. Update the Database Schema

Fields, which contain sensitive information protected by Sharemind HI, need to be stored in a special way in the database. The encrypted field consists of an encryption header and a ciphertext. They can be stored together in a BINARY field (or base64 encoded in a CHAR field, …).

CREATE TABLE IF NOT EXISTS "tableName" (
    ... other fields

    -- Before: Unprotected sensitive fields
    sensitive_data VARCHAR(64)

    -- After: Protected Confidential fields
    -- 36 B header, 64 B ciphertext (padded, fixed size) and
    sensitive_data BINARY(100) ,
    -- OR if you use base64 encoding:
    sensitive_data CHAR(136) ,
);

Note: As described earlier, sensitive fields are protected with an AEAD encryption which uses AAD for integrity protection and access control. The construction of the AAD needs to be well considered, and be specifically aware of the following issues:

Renaming of static names: such as table names or column names. These could be modified in the future, thus making it complicated to reconstruct the AAD.
Schema changes: could remove non-sensitive columns from a table whose values were used in the AAD construction.
Access to required AAD components: needs to be provided, as otherwise the task enclave fails to decrypt the data. This is relevant e.g. in joins for batch processing. In that case when the table name or column name is no longer obvious, or values of public fields of the same row have been omitted from the join.

In all of these cases the affected tables could be re-encrypted with a migration task enclave (described in Section A.7 of Appendix A), using an old and new AAD, but this is a resource intensive process for larger tables.

5.2. Sharemind HI Server:

You need to implement a couple of task enclaves for the different tasks at hand, and then configure the Sharemind HI Server to tie everything together. Additional examples for operating with the PEDB architecture and Sharemind HI are listed in Appendix A.

Note: See the Data Analysis Platform Tutorial for how to setup the development environment for the task enclaves.

5.2.1. Overall Flow In Task Enclaves

Task enclave code for the different use cases follows a certain pattern. The high level idea is explained in this level, and the specific sections go into details for diverging behavior.

The following code is a simplified version of this high level pattern:

// Used for supplying additional AAD values to the
// reencryption method.
struct MetaData {
    std::string tableName;
    /* more ... */
};

// Define a metadata field such that it can later be used in `TableSchema`. The
// name must also be the name of a member of the metadata type which you use in
// the `reencrypt` function.
METADATA_FIELD(tableName);
// ...

// Define the names of all the columns for later use in the `TableSchema`.
COLUMN(pk);
COLUMN(fk_users);
COLUMN(sensitive_data1);
// ...

// Entrypoint into the enclave. Invoked asynchronously by the `taskRun`
// action from the client libraries.
void run(TaskInputs const & inputs, TaskOutputs &) {
    // Note: In the code, argument parsing and permission signature verification
    // is moved into the respective Context object. This way the `run` function
    // stays leaner and contains only the interesting bits.

    // Parse the `TaskArgument` key-value pairs.
    auto const arguments = Arguments(inputs);

    // An object which wraps all the relevant information for the
    // reencryption operation. Actual class is e.g. ReportContext.
    auto ctx = SomeContext{arguments, ...};

    // Verify that the permissions match the recipient identity, and
    // extract the permissions such that you can verify them.
    // NB This is not necessary when the task was invoked by a
    // cron job.
    auto const permissions = verifyAndParsePermissions(
            ctx.outputKeyData.producer.upid /* or from input key */,
            arguments.permission_data,
            arguments.admin_signature_1,
            inputs.dataflowConfiguration().stakeholder("admin1").publicKey(),
            arguments.admin_signature_2,
            inputs.dataflowConfiguration().stakeholder("admin2").publicKey());

    if (arguments.route == "route1") {
        MetaData metaData{{"table1"}};
        // TODO you need to implement this.
        YourPermissionsCheckFunction("table1", permissions);

        // Define the table schema and used AADs of the incoming data.
        // Outgoing data uses the same structure and AADs.
        // Or you might need to define two table schemas.
        using TS = TableSchema<Public<36, pk>,
                               Public<36, fk_user>
                               /*
                               If the field 'pk' contains
                               "7638e763-861f-462f-bd62-9d18cd666f8c"
                               and the field 'fk_users' contains
                               "96ba3380-f3b9-4074-a21c-2c191dd4a4ff",
                               then the AAD of the field 'sensitive_data1' is:

                               "sensitive_data1;table1;7638e763-861f-462f-bd62-9d18cd666f8c;96ba3380-f3b9-4074-a21c-2c191dd4a4ff"
                               */
                               Private<40, sensitive_data1, AAD<tableName, AllPublicColumns>>,
                               >;

        // This call does all the heavy lifting: reading, fetching
        // decryption keys, decrypting, encrypting, writing.
        ctx.reencrypt<TS>(metaData);

    } else ... /* all your different routes */
}

The majority of the actions is implemented in helper functions which you can use as is, and just modify the way how they are used. Inline documentation provides additional information.

Reading arguments. The list of arguments is defined in the Arguments struct. If you want to modify the names or add arguments, please modify the Arguments struct.
Constructing a helper struct of type Context. This gathers different information into a single context variable which can then be used to perform the re-encryption action.
Verifying & parsing the permissions of this user using the verifyAndParsePermissions function. This function assumes the comma-separated layout of the permissions string and strips the Upid value from the beginning.
Note: This is not necessary when the task was invoked by a cron job. The results of the cron job will typically either be stored in a topic, or in the database. In the latter case, access controls need to be enforced when reading the data, i.e. the AAD needs to contain enough relevant information.
Do the routing. This is an if-else if-… chain which decides which re-encryption operation to perform. Note that in C++ you cannot use std::string or string literals inside of switch statements.
Verify that the permissions of the user are sufficient to read data from that table. Such a check can also be done on a per-row basis (not shown here).
Define the table schema TableSchema of the incoming and outgoing data. Right now the table schema can only contain fixed size columns, so any VARCHAR columns need to be padded manually up to a uniform length. This is just an implementation trade-off to keep the code simpler and can be changed later.
Note: Outgoing data might need a separately defined TableSchema.
Performing the real action, for example re-encryption like ctx.reencrypt<TableSchema<…>>(metadata). This function provides a declarative way to perform the re-encryption operation. You need to declare the schema of the input table with you want to re-encrypt, and the function then does all the necessary I/O, decryption and encryption.

5.2.2. Implement the task enclave for inserting data to the DB

This section is related to the sequence diagram in Section 4.5, #13 - #28. The general code flow is shown in Section 5.2.1 and available in file tasks/src/writeEnclave/writeEnclave.cpp

5.2.3. Implement the task enclave for reading data from the DB

This section is related to the sequence diagram in Section 4.6, #13 - #28. The general code flow is shown in Section 5.2.1 and available in file tasks/src/readEnclave/readEnclave.cpp

5.2.4. Advanced - Implement the batch processing task enclave

This section is related to the sequence diagram in Section 4.7, #9 - #27. The general code flow is shown in Section 5.2.1 and available in file tasks/src/batchProcessingEnclave/batchProcessingEnclave.cpp

Note: The output must contain a column which contains the sha256 hash over the input arguments, and which is used as a part of the AAD for the encrypted fields of the output. The hash provides integrity protection, as illustrated in the below sequence diagram. If some request type does not contain any input arguments for customization, then the hash can be omitted as well.

If necessary, an additional nonce (a random 64 bit value or a UUID, that is re-generated for each query and which has not been used previously) could be used as part of the input to add freshness guarantees, although the untrusted web service could still provide the task enclave with outdated information from an old snapshot of the database.

Figure 3. End-user centric view to illustrate the importance of integrity protection of input arguments in the full round trip.

5.2.5. Setup the HI Server

Before starting the Sharemind HI Server, make sure you have setup and configured the Sharemind HI Server.

The DFC requires special attention. A working example of the DFC with documentation is available in tasks/test/dataflow-configuration-description.yaml.

If you use the Sharemind HI TypeScript Client Library on the server side, then please refer to the README.md file in the TypeScript library. This goes into details of how to do additional setup when you use the TypeScript client library.
Especially note that you add the following configuration option:

# Only necessary when using the TypeScript Client Library on the server.
Server.RequireClientCertificate: "DO_NOT_REQUEST_OR_REQUIRE_OR_VERIFY"

This is required, because the gRPC web proxy does not support client authentication.

5.3. Client application:

5.3.1. Integrate the Sharemind HI Client library (C++)

This library is used on the client side only to upload keys into Sharemind HI. This means, actions like read or write requests do not use the Sharemind HI Client library.

You received the Sharemind HI Client C++ library separately (as source code or precompiled library). You need to integrate this into your build environment.

5.3.2. Configure the Sharemind HI Client library (C++)

The most important type in the Sharemind HI Client library for C++ is sharemind_hi::Client. You need to provide the following information, and you need to decide which information to hardcode, sideload or ask from the user:

the user’s certificate and private key (possibly created by your client application)
fingerprints of the enclaves (provided by Cybernetica)
the Intel Attestation Service Signing Certificate Root (can be downloaded from Intel)
For the runtime, the certificates of the enforcers (which are also configured in the DFC).

The Sharemind HI Server only speaks gRPC. However, the client library can be used in a tunneled mode, where the library itself does not perform gRPC calls. Instead, it gives messages to a callback which needs to transmit the bytes to the Sharemind HI Server. You can thus send the data over an existing API endpoint to your server, and your server then tunnels the request (described in Section 5.6.2).

The clientApp/src/communication/HICommunication.cpp file shows how to construct the client object (buildHIclient function).

// Whenever the `client` object wants to send a message to the Sharemind HI
// Server, it invokes the callback.
auto tunneledChannel = sharemind_hi::client::TunneledChannel{
        [&your_api](std::string const & method, std::string const & serializedRequestBLOB)
        -> std::string {
            // #7 KMI Forward the request through your API, and let your web
            // server do the communication with the Sharemind HI Server.
            return your_api.put("/HI/" + method, serializedRequestBLOB);
        };
};
auto attestationOptions = ...;
auto callbackHandler = ...;
auto sessionData = ...; // key, cert and more - creation explained in a later section.

// The `client` object which can live for the entire session of the user.
auto client = sharemind_hi::client::Client(attestationOptions,
                                           callbackHandler,
                                           sessionData,
                                           tunneledChannel);

5.3.3. Implement the private key + certificate creation

This section is related to #3, #4, #5 and #9 of the sequence diagram in Section 4.2, and the login shown in #2 of the sequence diagram in Section 4.4. The code is in file clientApp/src/client/Client.cpp. The code also displays how to encrypt the private key with a passphrase, store it in database, and download and decrypt it upon the next login. These actions are only mentioned in a note in the sequence diagram.

NOTE: During development, when you have not yet created your own deployment certificate, use the following private key and certificate instead of generating new ones:

lib/cmake/sharemind-hi/task-enclave-project-default-files/ca_stakeholders/end-entity-user-1-1.key
lib/cmake/sharemind-hi/task-enclave-project-default-files/ca_stakeholders/end-entity-user-1-1.crt

The private key and CSR generation can be done in various ways, hence the following snippet uses the widespread OpenSSL CLI (more details can be found in the Certificate & Key Setup page.):

# Two ways to create the client key:
# (1) Without passphrase.
openssl ecparam -name prime256v1 -genkey -noout -out client.key
# (2) With passphrase (PKCS#8).
openssl genpkey -algorithm EC -pkeyopt ec_paramgen_curve:P-256 -aes-128-cbc -out client.key

openssl req -new -key client.key -out client.csr

The generated Certificate Signing Request (CSR, or UCSR in the sequence diagram) then needs to be sent to the administrator who signs the certificate (Section 5.4.2). Sending the CSR could be done for example via E-Mail, via storing it in the DBMS or some separate storage system.

When it is signed, the user needs to access it whenever they initialize the Sharemind HI client library, i.e. during login (#2 of the sequence diagram in Section 4.4). The certificate needs to be used in the future by the administrators, too, hence a centrally accessible storage system like the DBMS might be a suitable place for the certificate.

Note: The Sharemind HI Client libraries usually expect the certificate and private key in the PEM or DER format, and implement conversion functions between the two.

5.3.4. Upload the write key and read key

This section is related to the sequence diagram in Section 4.4. Done once per login, the user creates keys and uploads them with the dataUpload function of the Sharemind HI Client library. The keys themselves and the returned IDs need to be stored for the duration of the session, but not on the persistent memory to prevent leakage of the key.

// #5 #19 Use your favorite source of randomness, e.g. arc4random or SecRandomCopyBytes
std::array<uint8_t, 16> const writeKey = ...;
std::array<uint8_t, 16> const readKey = ...;

// #6
auto const writeKeyId = client.dataUpload(
    "write_keys",
    config.trustedEnforcers,
    {writeKey.data(), writeKey.size()}
);
// #20
auto const readKeyId = client.dataUpload(
    "read_keys",
    config.trustedEnforcers,
    {readKey.data(), readKey.size()}
);

// #16 Use your favorite source of randomness, e.g. arc4random or SecRandomCopyBytes
std::array<uint8_t, 12> const writeKeyIv = ...;

5.3.5. Adding encryption and decryption of outgoing and incoming data

This section is related to the sequence diagrams in Section 4.5 and Section 4.6. An example encryption function encryptField and decryption function decryptField is provided in clientApp/src/communication/Crypto.cpp. You probably want to customize them or add additional abstractions on top of them, e.g. to directly perform base64 encoding (for transmission via JSON), encryptFieldB64 and decryptFieldB64.

Encrypting data on the user device is required when data shall be inserted to the database, Section 4.5:

// #5 #15 Of the section 'End-User Short-Term Key Management'.
auto const & writeKey   = /* created during login. */;
auto const & writeKeyId = /* created during login. */;

auto const table1_row = Table1 {
    .pk = "...",                // 36 bytes, unencrypted
    .fk_users = "...",          // 36 bytes, unencrypted
    .sensitive_data1 = 0,       //  4 bytes, to be encrypted
};

// #5
// With JSON the binary data needs to be stringified, e.g. by using base64.
auto encrypted_field =
    encryptFieldB64(
        writeKey,
        writeKeyId,
        table1_row.sensitive_data1,
        // #4, same as will be used in the task enclave, #26.
        AADtoBytes({"sensitive_data1", "table1", pk, fk_users}));

Json rowJson;
rowJson["pk"] = table1_row.id;
rowJson["fk_users"] = table1_row.fk_users;
rowJson["sensitive_data1"] = encrypted_field;

// #6 #35
your_api.put("/table1/" + user_id + "/" + writeKeyId, rowJson);

Decrypting data on the user device is shown in Section 4.6:

// #19 #29 Of the section 'End-User Short-Term Key Management'.
auto const & readKey   = /* created during login. */;
auto const & readKeyId = /* created during login. */;

// #4 #33
// The re-encryption enclave needs to know which key to use for encryption.
// The ID of this key needs to be transmitted as part of your `get` API endpoint,
auto table1Json = your_api.get("/table1/" + user_id + "/" + readKeyId);

std::vector<Table1> result;
for (auto const & rowJson : table1Json) {
    auto & table1_row = result.emplace_back(Table1{
            .pk = rowJson["pk"],
            .fk_users = rowJson["fk_users"],
            .sensitive_data1 = {} /* decrypted below */,
    });

    // #35
    table1_row.sensitive_data1 = decryptFieldB64<decltype(table1_row.sensitive_data1)>(
            readKey,
            rowJson["sensitive_data1"].as_string(),
             // #34 Construct the same AAD as in the task enclave
             // during re-encryption #26.
            AADtoBytes({"sensitive_data1", "table1", table1_row.pk, table1_row.fk_users}));
}

5.3.6. Advanced - Invoke The Batch Processing

This section is related to the sequence diagrams in Section 4.7. It is similar to Section 5.3.5, so read through that section first.

// #19 #29 Of the section 'End-User Short-Term Key Management'.
auto const & readKey   = /* created during login. */;
auto const & readKeyId = /* created during login. */;

// Some argument for the report generation, provided by the user.
const std::string customInputParameter = "...";

// You can combine the processing call with the read query to have only a single
// roundtrip from the client side.
const auto & outputJson =
    // #3 Make the request
    your_api.put("/batch_processing/" + user_id + "/" + readKeyId + "/" + customInputParameter );

// #38 Verify the hash
if (outputJson["hash"] != hash(customInputParameter)) {
    throw std::runtime_error("Hash mismatch, the batch processing enclave did not use my inputs");
}

// #39 #40 Decrypt similarly to read query use case, omitted here.

5.4. Administrator application:

The administrator does not need to talk to interact with the Sharemind HI Server. Instead they manage their own keys, user certificates and user permissions. These steps are common to all Sharemind HI solution architectures and thus already explained in other documentation of Sharemind HI. In the following sections an overview and further references are given.

5.4.1. Key & Certificate Setup

This section is related to the sequence diagram in Section 4.1, ref Key & Certificate Setup. The details of this step are explained in the Certificate & Key Setup page. The administrators need to create a private key and CSR (same openssl commands as for the client Section 5.3.3) and send the CSR to the Coordinator. When the Coordinator sends the signed certificate back, the administrator configures the certificate in their application, such that the Sharemind HI client library can connect to the Sharemind HI Server.

5.4.2. Implement signing CSRs.

This section is related to the sequence diagram in Section 4.2, step 7 where the administrator signs the certificate signing request (CSR) of the new user.

The administrator uses their private key and certificate (Section 5.4.1) to sign the CSR of a user. This procedure creates a set of custom V3 extensions in the X509 user certificate and can be performed with the OpenSSL CLI.

When this step is done, the user needs to configure the resulting certificate in their application/Sharemind HI Client library to communicate with the Sharemind HI Server (Section 5.3.3).

5.4.3. Integrate with the certificate store (e.g. DBMS)

This section is related to steps #8 and #12 of the sequence diagram in Section 4.2. Both storing and retrieving a X509 certificate of a user in/from the certificate store need to be implemented in the administrator application. The certificate store can be any storage system which is accessible to both the owner of the certificate (the user) and the two administrators. Hence the central DBMS might be a good solution to store the user certificates in.

5.4.4. Implement signing of user permissions.

This section is related to step #14 of the sequence diagram in Section 4.2. Both administrators need to sign the permissions of the user, as the enclave, which verifies the signatures, implements the Four Eyes Principle to protect against malicious administrators.

User permissions are encoded as comma-separated strings, as in the following example which uses key-value pairs for structuring (UPID is explained below):

<UPID>,user_id=<user_id>,dept_id=<dept_id>,project=<project1>,project=<project2>,...

It might be useful to hex encode all the <…> values such that the string can be easily displayed:

d7d20141-2a03-4f6e-96cf-4a96e8bf8215,user_id=7638e763-861f-462f-bd62-9d18cd666f8c,dept_id=96ba3380-f3b9-4074-a21c-2c191dd4a4ff,...

UPID: A concept used in Sharemind HI, which is constructed as sha256(0u64 || certificate_DER), where 0u64 is a zero of eight bytes length, and certificate_DER is the binary (not hex or base64 encoded) DER representation of the certificate. It is used within Sharemind HI to shorten the user certificate into a fixed size representation.
<user_id>: This can be the user ID from the database.
<dept_id> / <project1> / …: Further values which grant permissions to a user, as you see fit for your use case.

This string, after inspection by the administrator, is signed with the private key of the administrator (Section 5.4.1), using standard ECDSA. With the OpenSSL CLI this can be done as follows, but requires post-processing to convert the ASN.1 signature into a format which is understood by Sharemind HI:

printf '.. permissions string ..' > permissions_string
openssl dgst -sign admin_key.pem permissions_string > /tmp/signature.asn1
python3 ./convert_signature.py # script is shown below.

The conversion can be done similar to the following convert_signature.py python script, which stores the big endian X and Y component from the ASN.1 structure as two little endian 32 byte integers:

#!/usr/bin/env python3

input_filename = 'signature.asn1'
output_filename = 'signature.hi'

def read_binary_file(path):
    with open(path,'rb') as file:
        return file.read()

def write_binary_file(path, data):
    with open(path, 'wb') as file:
        file.write(data)

integer_size = 32
def integer_data_offset(integer_meta_offset, asn1):
    return integer_meta_offset + 2 + (asn1[integer_meta_offset + 1] & 1)

def signature_asn1_to_hi(asn1):
    integer1_offset = integer_data_offset(2, asn1)
    integer2_offset = integer_data_offset(integer1_offset + integer_size, asn1)
    integer1 = asn1[integer1_offset:][:integer_size]
    integer2 = asn1[integer2_offset:][:integer_size]
    # ASN.1 stores integers in big endian, but HI (the SGX SDK) wants the
    # numbers in little endian.
    return integer1[::-1] + integer2[::-1]

signature_asn1 = read_binary_file(input_filename)
signature_hi = signature_asn1_to_hi(signature_asn1)
write_binary_file(output_filename, signature_hi)

5.5. gRPC web proxy

The web library of the Sharemind HI Client communicates via the gRPC-web protocol. There needs to be a translation layer which transforms it into the regular gRPC protocol which can be understand by the Sharemind HI Server. An easy start is to use grpcwebproxy:

https://github.com/improbable-eng/grpc-web/tree/master/go/grpcwebproxy

If you need more features, envoy can be used (though we don’t have experience with it).

5.6. Webserver:

5.6.1. Integrate the Sharemind HI Client library (TypeScript)

The TypeScript client library of Sharemind HI is provided to you by Cybernetica. It can be used from Node.js.

5.6.2. Configure the Sharemind HI Client library (TypeScript)

The configuration of the TypeScript library is similar to the configuration of the C++ library. However, on the server the client library is used in two modes:

(1) As a regular client to perform the taskRun action (and possibly others):

const session = new Session(config, {loggingCallback: console.log});

(2) As a tunneling client which lifts tunneled requests from the users onto gRPC:

// For short-term key management.
const tunneled_client = new TunneledClient(grpcWebProxyAddress);

Note that you need to use the address of the gRPC web proxy instead of the address of the Sharemind HI Server (as explained in the TypeScript library README.md)

5.6.3. Implement the long-term storage key renewal

This section is related to the sequence diagram in Section 4.3, #2. The recreation needs to be performed regularly. Once a month is sufficient. Note that in node.js, the timer APIs use 32 bit numbers so the max delay is less than 25 days.

// This function invokes the keygen_enclave once every 24.8 days.
(async function addNewStorageKey() {
    await runKeygenEnclave(session);
    const maxDelay = 2147483647; // ca. 24.8 days
    setTimeout(addNewStorageKey, maxDelay);
})();

5.6.4. Implement the re-encryption call for the write flow

This section is related to the sequence diagram in Section 4.5, #7 - #10, #31 - #34. The write flow is very similar to the read flow in the following Section 5.6.5.

5.6.5. Implement the re-encryption call for the read flow

This section is related to the sequence diagram in Section 4.6, #5 - #11, #31, #32. This is the code which is required to re-encrypt database rows on the server side. The rows from the database are provided as the inTable1 argument, and the returned outTable1 can then be sent to the client who can decrypt the data. The actual communication with Sharemind HI happens in the middle. The relevant code is also available in files webServer/src/protectedTables/Table1.ts, webServer/src/server.ts, and webServer/src/HICommunication.ts

// Some example database row.
type Table1 = {
    pk: string;                         // size: 36 bytes, unencrypted.
    fk_users: string;                   // size: 36 bytes, unencrypted.
    sensitive_data1: Uint8Array;        // size: 40 bytes, encrypted.
}

function getTable1(session: HI.Session, request_info: ...) {
    // #5 #6 Your query against the DB to get the permission data.
    const permission_info = query_db_permissions(request_info);

    // #7 #8 Your query against the DB to get the requested rows
    const inTable1: Table1[] = query_db_table1(request_info);

    // Serialize the data into a binary, rectangular format.
    const bufs: Buffer[] = [];
    for (const row of inTable1) {
        bufs.push(Buffer.from(row.pk));
        bufs.push(Buffer.from(row.fk_users));
        bufs.push(Buffer.from(row.sensitive_data1));
    }

    // #9 Write the data to the disk.
    writeFileSync(input_file, Buffer.concat(bufs));

    // #11 Run and await the task.
    await session.taskRunSync(new mt.TaskRunRequest(
            new mt.ArbitraryName("read_enclave"),
            new mt.RequiredTrustedEnforcers([]),
            [
                new mt.TaskArgument("input_file", input_file),
                new mt.TaskArgument("output_file", output_file),
                new mt.TaskArgument("read_key_id", request_info.read_key_id),
                new mt.TaskArgument("route", "route1"),
                new mt.TaskArgument("permission_data", permission_info[0]),
                new mt.TaskArgument("admin_signature_1", permission_info[1]),
                new mt.TaskArgument("admin_signature_2", permission_info[2])
            ]));

    // #31 #32 Read back the re-encrypted rows
    const bytes = readFileSync(output_file);

    // Transform bytes back into js objects/json
    const outTable1: Table1[] = [];
    for (let i = 0; i < bytes.length; i += TABLE1_TOTAL_BYTELENGTH) {
        outTable1.push({
            pk:                 bytes.toString('utf8', i +  0, i +  36),
            fk_users:           bytes.toString('utf8', i + 36, i +  72),
            sensitive_data1:    bytes.subarray(        i + 72, i + 112)
        });
    }

    // Done
    return outTable1;
}

5.6.6. Advanced - Implement the batch processing call

This section is related to the sequence diagram in Section 4.7, #5 - #11, #30 - #36. The general flow is very similar to the read query flow in Section 5.6.5 with an added processing function and an additional integrity check for public arguments. It is also possible to provide sensitive arguments, however this is not demonstrated here.

// Some example database row.
type Input = {
    pk: string;                 // size: 36 bytes, unencrypted.
    sensitive_data: Uint8Array; // size: 40 bytes, encrypted.
    // ...
}

type Output = {
    hash: string                // size: 32 bytes, unencrypted
    // ...
}

function batchProcess(session: HI.Session, request_info: ...) {
    // #4 #5 Your query against the DB to get the permission data.
    const permission_info = query_db_permissions(request_info);

    // #6 #7 Your query against the DB to get the requested rows
    const table: Input[] = query_db_table1(request_info);

    // Serialize the data into a binary, rectangular format.
    const bufs: Buffer[] = [];
    for (const row of table) {
        bufs.push(Buffer.from(row.pk));
        bufs.push(Buffer.from(row.sensitive_data));
        // ...
    }

    // #8 #9 Write the data to the disk.
    writeFileSync(input_file, Buffer.concat(bufs));

    // #10 Run and await the task.
    await session.taskRunSync(new mt.TaskRunRequest(
            new mt.ArbitraryName("batch_processing_enclave"),
            new mt.RequiredTrustedEnforcers([]),
            [
                new mt.TaskArgument("input_file", input_file),
                new mt.TaskArgument("output_file", output_file),
                new mt.TaskArgument("permission_data", permission_info[0]),
                new mt.TaskArgument("admin_signature_1", permission_info[1]),
                new mt.TaskArgument("admin_signature_2", permission_info[2]),
                new mt.TaskArgument("customInputParameter", request_info.customInputParameter)
            ]));

    // #35 #36 Read back the re-encrypted rows
    const bytes = readFileSync(output_file);

    // Transform bytes back into js objects/json
    const outputs: Output[] = [];
    for (let i = 0; i < bytes.length; i += OUTPUT_TOTAL_BYTELENGTH) {
        outputs.push({
            // ... as for other use cases.
        });
    }

    // Done
    return outputs;
}

5.6.7. Implement the forwarding to Sharemind HI

This section is related to the sequence diagram in Section 4.4. Forwarding (tunneling) incoming requests to Sharemind HI from a user can be done similar to the following code, extracted from webServer/src/router.ts which uses express:

// #1 This line was already shown earlier.
const tunneled_client = new TunneledClient(grpcWebProxyAddress);

// #8
const bytesParser = bodyParser.raw({ type: 'application/octet-stream' });
router.post("/HI/:method", bytesParser, async (req: Request, res: Response) => {
    const methodName: string = req.params.method;
    const serializedRequest = Uint8Array.from(req.body);

    let tunneled_result: Uint8Array;
    try {
        // #9 The request can be directly forwarded to HI without further inspection.
        tunneled_result = await tunneled_client.dispatch(methodName, serializedRequest);
    } catch (e) {
        res.status(500).send("HI Error:" + e);
        return;
    }

    // #13
    res.status(200).send(Buffer.from(tunneled_result.buffer));
});

Appendix A: Cookbook

In this section we will provide concrete code examples, or recipes, of common operations with the PEDB architecture.

The cookbook contains examples for:

Adding a new table (Section A.1)
Adding a new regular column to an existing table (Section A.2)
Adding a new sensitive column to an existing table (Section A.3)
Deleting a sensitive column (Section A.4)
Deleting a regular column (Section A.5)
Adding new data processing (Section A.6)
Changing admins/admin_signatures (Section A.8)
Updating the structure of the permission_string (Section A.9)
Advanced - Re-encryption and migration in a live environment (Section A.7)

After applying all the recipes, the data model will have changed:

The added elements in the data model (the table table3 and columns fk_table3 and sensitive_data5 in table table1) are colored with orange. The deleted columns are regular_data1 and sensitive_data2 in table2. Finally, the contents of all values in the columns permission_string, admin_1_signature, and admin_2_signature in the table users have been updated.

For conciseness and simplicity, the recipes assume that the system can be restarted and data is not persisted. For applying the recipes in a live system refer to Section A.7: "Advanced - Re-encryption and migration in a live environment".

A.1. Adding a new table

Adding a new table to the data model requires making changes through-out the code base in all components. This section will go through them one-by-one by adding a new table table3 with a single regular column and a single sensitive data colum.

A.1.1. Database side

The smallest change happens in the database, where the table has to be defined and added. With the example PostgreSQL database the addition looks like this:

CREATE TABLE IF NOT EXISTS table3
(
    pk UUID PRIMARY KEY,
    sensitive_data4 CHAR(56) NOT NULL
);

The value of sensitive_data4 is currently modeled as a 32-bit integer, and we are storing the ciphertext as a base64 encoded string. This means the total byte-length of the value is 4B + 36B or 40B, which translates to 56B in base64 encoding.

The relevant code is also available in files DB/init.sql

A.1.2. Enclave side

Next we can add a route in the re-encryption enclaves. The additions in both enclaves are identical.

We first define the new columns used by the table. Note that we do not need to re-define columns that are also used in other tables (in this case the column pk).

// Table 2
// ...

// Table 3
COLUMN(sensitive_data4);

// ...

Then we define the new route and the table schema by adding a new else if branch:

    }  else if (ctx.args.route == "route3") {
        MetaData metaData{{"table3"}};
        using TS = TableSchema<
                Public<UUID_SIZE, pk>,
                Private<4 + EH_SIZE, sensitive_data4, AAD<tableName, AllPublicColumns>>
                >;
        ctx.reencrypt<TS>(metaData);
    }

Changing the enclave code will also change the corresponding enclave fingerprints. If the fingerprints are hard coded in the DFC or client configurations, then those should also be updated to reflect the change.

The relevant code is also available in files tasks/src/writeEnclave/writeEnclave.cpp and tasks/src/readEnclave/readEnclave.cpp

A.1.3. Webserver side

The largest additions happen in the web-server side, where we have to define an intermediary class for the table, add new database queries to access the table, and add API end-points for the users to provide and recieve data from the tables. However the general flow follows the same style as in Section 5.6.5.

// Some example database row.
type Table3 = {
    pk: string;                         // size: 36 bytes, unencrypted.
    sensitive_data4: Uint8Array;        // size: 40 bytes, encrypted.
}

function getTable3(session: HI.Session, request_info: ...) {
    const permission_info = query_db_permissions(request_info);

    // Query the new table from your DB
    const inTable3: Table3[] = query_db_table3(request_info);

    // Serialize as with other tables
    const bufs: Buffer[] = [];
    for (const row of inTable3) {
        bufs.push(Buffer.from(row.pk));
        bufs.push(Buffer.from(row.sensitive_data4));
    }

    // Only the route changes in the task arguments
    writeFileSync(input_file, Buffer.concat(bufs));
    await session.taskRunSync(new mt.TaskRunRequest(
            new mt.ArbitraryName("read_enclave"),
            new mt.RequiredTrustedEnforcers([]),
            [
                new mt.TaskArgument("input_file", input_file),
                new mt.TaskArgument("output_file", output_file),
                new mt.TaskArgument("read_key_id", request_info.read_key_id),
                new mt.TaskArgument("route", "route3"),
                new mt.TaskArgument("permission_data", permission_info[0]),
                new mt.TaskArgument("admin_signature_1", permission_info[1]),
                new mt.TaskArgument("admin_signature_2", permission_info[2])
            ]));
    const bytes = readFileSync(output_file);

    // Deserialize as for other tables
    const outTable3: Table3[] = [];
    for (let i = 0; i < bytes.length; i += TABLE3_TOTAL_BYTELENGTH) {
        outTable3.push({
            pk:                 bytes.toString('utf8', i +  0, i + 36),
            sensitive_data4:    bytes.subarray(        i + 36, i + 76)
        });
    }

    // Done
    return outTable3;
}

These changes are reflected in the files webServer/src/server.ts, webServer/src/HICommunication.ts, and webServer/src/protectedTables/Table3.ts.

A.1.4. Client side

Lastly the client side has to be updated similarly to Section 5.3.5. The encrypt and write operations:

auto const row = Table3 {
    .pk = "...",                // 36 bytes, unencrypted
    .sensitive_data4 = 0,       //  4 bytes, to be encrypted
};

auto const encrypted_field = encryptFieldB64(
    writeKey,
    writeKeyId,
    row.sensitive_data4,
    AADtoBytes({"sensitive_data4", "table3", pk})
);

Json rowJson;
rowJson["pk"] = row.id;
rowJson["sensitive_data4"] = encrypted_field;

your_api.put("/table3/" + user_id + "/" + writeKeyId, rowJson);

As well as read and decrypt operations:

auto table3Json = your_api.get("/table3/" + user_id + "/" + readKeyId);

std::vector<Table3> result;
for (auto const & rowJson : table3Json) {
    auto & row = result.emplace_back(Table3{
            .pk = rowJson["pk"],
            .sensitive_data4 = {},
    });

    row.sensitive_data4 = decryptFieldB64<decltype(row.sensitive_data4)>(
            readKey,
            rowJson["sensitive_data4"].as_string(),
            AADtoBytes({"sensitive_data4", "table3", row.pk}));
}

These changes are reflected in the files clientApp/src/Main.cpp, clientApp/src/communication/ServerCommunication.cpp, clientApp/src/protectedTables/Table3.cpp, and the corresponding headers.

Now all is ready to use the new table as you see fit in the client application.

A.2. Adding a new regular column

Regular columns come in two varieties - they can be a part of some sensitive column’s AAD, or not. If the column is not a part of any AAD, then no changes are required to the PEDB sub-system, however additional care must be taken to ensure that the re-encryption enclaves recieve the table in the same shape as before the schema change. If the column is a part of some sensitive column’s AAD (or if you wish to keep the table schemas in the enclaves consistent with the database schema), then the enclaves need to be updated to reflect the change.

Note that changing the AAD will require re-encrypting and migrating any existing data. This section will assume that no previous data is persisted and the system can be fully reset. To apply the recipe in a live environment see Section A.7.

This section will walk through adding the foreign key column fk_table3 to table1.

A.2.1. Database side

On the database side we simply add the new column fk_table3 to the table:

CREATE TABLE IF NOT EXISTS table1
(
    pk UUID PRIMARY KEY,
    fk_users UUID NOT NULL,
    fk_table3 UUID NOT NULL,
    sensitive_data1 CHAR(56) NOT NULL,
    CONSTRAINT table1_fkey_users FOREIGN KEY (fk_users)
        REFERENCES users (pk) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION,
    CONSTRAINT table1_fkey_table3 FOREIGN KEY (fk_table3)
        REFERENCES table3 (pk) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION
);

The change is reflected in file DB/init.sql

A.2.2. Enclave side

Similarly to the database, we add the new column to the schema defined in the appropriate routes in both the read and write enclaves, and in the batch processing enclave. Note that since the AAD of sensitive_data1 includes AllPublicColumns, the new column will also automatically be included in the AAD. If your AAD has explicit columns instead, you will also have to decide if the new column should be included in the AAD.

MetaData metaData{{"table1"}};
using TS = TableSchema<
        Public<UUID_SIZE, pk>,
        Public<UUID_SIZE, fk_users>,
        Public<UUID_SIZE, fk_table3>,
        Private<4 + EH_SIZE, sensitive_data1, AAD<tableName, AllPublicColumns>>
        >;

ctx.reencrypt<TS>(metaData);

The new column also has to be defined before it can be used:

// ...
// Table1
COLUMN(pk);
COLUMN(fk_users);
COLUMN(fk_table3);
COLUMN(sensitive_data1);
// ...

The changes are reflected in files tasks/src/writeEnclave/writeEnclave.cpp and tasks/src/readEnclave/readEnclave.cpp

A.2.3. Webserver side

No large changes are required in the webserver, only the bytes-to-data transformations of the table structure need to be updated to match the schema:

type Table1 = {
    pk: string;                         // size: 36 bytes, unencrypted.
    fk_users: string;                   // size: 36 bytes, unencrypted.
    fk_table3: string;                  // size: 36 bytes, unencrypted.
    sensitive_data1: Uint8Array;        // size: 40 bytes, encrypted.
}

function getTable1(session: HI.Session, request_info: ...) {
    // ...

    // Serialize the data into a binary, rectangular format.
    const bufs: Buffer[] = [];
    for (const row of inTable1) {
        bufs.push(Buffer.from(row.pk));
        bufs.push(Buffer.from(row.fk_users));
        bufs.push(Buffer.from(row.fk_table3));
        bufs.push(Buffer.from(row.sensitive_data1));
    }

    // ...
    // ...

    // Transform bytes back into js objects/json
    const outTable1: Table1[] = [];
    for (let i = 0; i < bytes.length; i += TABLE1_TOTAL_BYTELENGTH) {
        outTable1.push({
            pk:                 bytes.toString('utf8', i +   0, i +  36),
            fk_users:           bytes.toString('utf8', i +  36, i +  72),
            fk_table3:          bytes.toString('utf8', i +  72, i + 108),
            sensitive_data1:    bytes.subarray(        i + 108, i + 148)
        });
    }

    // Done
    return outTable1;
}

These changes are reflected in the file webServer/src/protectedTables/Table1.ts.

A.2.4. Client side

The client side needs to be updated similarly to the enclaves, to incude the new column and use it as a part of the AAD. Note that the order of the elements in the AAD should also be the same as in the enclaves.

In the write and encrypt direction we then have:

auto const table1_row = Table1 {
    .pk = "...",                // 36 bytes, unencrypted
    .fk_users = "...",          // 36 bytes, unencrypted
    .fk_table3 = "...",         // 36 bytes, unencrypted
    .sensitive_data1 = 0,       //  4 bytes, to be encrypted
};

auto encrypted_field =
    encryptFieldB64(
        writeKey,
        writeKeyId,
        table1_row.sensitive_data1,
        AADtoBytes({"sensitive_data1", "table1", pk, fk_users, fk_table3}));

Json rowJson;
rowJson["pk"] = table1_row.id;
rowJson["fk_users"] = table1_row.fk_users;
rowJson["fk_table3"] = table1_row.fk_table3;
rowJson["sensitive_data1"] = encrypted_field;

// ...

And in the read and decrypt direction:

// ...

for (auto const & rowJson : table1Json) {
    auto & table1_row = result.emplace_back(Table1{
            .pk = rowJson["pk"],
            .fk_users = rowJson["fk_users"],
            .fk_users = rowJson["fk_table3"],
            .sensitive_data1 = {},
    });

    table1_row.sensitive_data1 = decryptFieldB64<decltype(table1_row.sensitive_data1)>(
            readKey,
            rowJson["sensitive_data1"].as_string(),
            AADtoBytes({"sensitive_data1", "table1", table1_row.pk, table1_row.fk_users, table1_row.fk_table3}));
}

These changes are reflected in the files clientApp/src/Main.cpp, clientApp/src/protectedTables/Table1.cpp, and the corresponding headers.

Now all is ready to use the updated table as you see fit, or to begin the live migration process as described in Section A.7.

A.3. Adding a new sensitive column

Adding a new sensitive column to an existing table is similar to adding a new table, but does not require adding any new routes or functions, only modifying existing ones.

As an example we will add the column sensitive_data5 to the table table1.

A.3.1. Database side

In the database we only add the new row to the schema:

CREATE TABLE IF NOT EXISTS table1
(
    pk UUID PRIMARY KEY,
    fk_users UUID NOT NULL,
    fk_table3 UUID NOT NULL,
    sensitive_data1 CHAR(56) NOT NULL,
    sensitive_data5 CHAR(56) NOT NULL,
    CONSTRAINT table1_fkey_users FOREIGN KEY (fk_users)
        REFERENCES users (pk) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION,
    CONSTRAINT table1_fkey_table3 FOREIGN KEY (fk_table3)
        REFERENCES table3 (pk) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION
);

The change is reflected in file DB/init.sql

A.3.2. Enclave side

As with the database, we only need to update the schemas in the read and write enclaves, as well as in the the batch processing enclave. First we define the new column:

// Table1
COLUMN(pk);
COLUMN(fk_users);
COLUMN(fk_table3);
COLUMN(sensitive_data1);
COLUMN(sensitive_data5);

And then we add it to the schema

MetaData metaData{{"table1"}};
using TS = TableSchema<
        Public<UUID_SIZE, pk>,
        Public<UUID_SIZE, fk_users>,
        Public<UUID_SIZE, fk_table3>,
        Private<4 + EH_SIZE, sensitive_data1, AAD<tableName, AllPublicColumns>>,
        Private<4 + EH_SIZE, sensitive_data5, AAD<tableName, pk, fk_table3>>
        >;

Note that we have decided to only use two column as part of the AAD.

The changes are reflected in files tasks/src/writeEnclave/writeEnclave.cpp and tasks/src/readEnclave/readEnclave.cpp

A.3.3. Webserver side

Analogously to adding a reqgular column, the new sensitive colum must also be added to the database queries and intermediary representation in the web-server:

type Table1 = {
    pk: string;                         // size: 36 bytes, unencrypted.
    fk_users: string;                   // size: 36 bytes, unencrypted.
    fk_table3: string;                  // size: 36 bytes, unencrypted.
    sensitive_data1: Uint8Array;        // size: 40 bytes, encrypted.
    sensitive_data5: Uint8Array;        // size: 40 bytes, encrypted.
}

function getTable1(session: HI.Session, request_info: ...) {
    // ...
    // Serialize the data into a binary, rectangular format.
    const bufs: Buffer[] = [];
    for (const row of inTable1) {
        bufs.push(Buffer.from(row.pk));
        bufs.push(Buffer.from(row.fk_users));
        bufs.push(Buffer.from(row.fk_table3));
        bufs.push(Buffer.from(row.sensitive_data1));
        bufs.push(Buffer.from(row.sensitive_data5));
    }
    // ...
    // ...
    // Transform bytes back into js objects/json
    const outTable1: Table1[] = [];
    for (let i = 0; i < bytes.length; i += TABLE1_TOTAL_BYTELENGTH) {
        outTable1.push({
            pk:                 bytes.toString('utf8', i +   0, i +  36),
            fk_users:           bytes.toString('utf8', i +  36, i +  72),
            fk_table3:          bytes.toString('utf8', i +  72, i + 108),
            sensitive_data1:    bytes.subarray(        i + 108, i + 148),
            sensitive_data5:    bytes.subarray(        i + 148, i + 188)
        });
    }
    return outTable1;
}

These changes are reflected in the files webServer/src/router.ts, webServer/src/queries.ts, and webServer/src/protectedTables/Table1.ts.

A.3.4. Client side

Finally we add encryption and decryption of the sensitive column to the client application. Note that the order of the elements in the AAD should also be the same as in the enclaves.

In the write and encrypt direction we then have:

auto const table1_row = Table1 {
    .pk = "...",                // 36 bytes, unencrypted
    .fk_users = "...",          // 36 bytes, unencrypted
    .fk_table3 = "...",         // 36 bytes, unencrypted
    .sensitive_data1 = 0,       //  4 bytes, to be encrypted
    .sensitive_data5 = 0,       //  4 bytes, to be encrypted
};

auto encrypted_field1 =
    encryptFieldB64(
        writeKey,
        writeKeyId,
        table1_row.sensitive_data1,
        AADtoBytes({"sensitive_data1", "table1", pk, fk_users, fk_table3}));
auto encrypted_field5 =
    encryptFieldB64(
        writeKey,
        writeKeyId,
        table1_row.sensitive_data5,
        AADtoBytes({"sensitive_data5", "table1", pk, fk_table3}));

Json rowJson;
rowJson["pk"] = table1_row.id;
rowJson["fk_users"] = table1_row.fk_users;
rowJson["fk_table3"] = table1_row.fk_table3;
rowJson["sensitive_data1"] = encrypted_field1;
rowJson["sensitive_data5"] = encrypted_field5;

// ...

And in the read and decrypt direction:

// ...

for (auto const & rowJson : table1Json) {
    auto & table1_row = result.emplace_back(Table1{
            .pk = rowJson["pk"],
            .fk_users = rowJson["fk_users"],
            .fk_users = rowJson["fk_table3"],
            .sensitive_data1 = {},
            .sensitive_data5 = {},
    });
    // ...
    table1_row.sensitive_data1 = decryptFieldB64<decltype(table1_row.sensitive_data1)>(
            readKey,
            rowJson["sensitive_data1"].as_string(),
            AADtoBytes({"sensitive_data1", "table1", table1_row.pk, table1_row.fk_users, table1_row.fk_table3}));
    table1_row.sensitive_data5 = decryptFieldB64<decltype(table1_row.sensitive_data5)>(
            readKey,
            rowJson["sensitive_data5"].as_string(),
            AADtoBytes({"sensitive_data5", "table1", table1_row.pk, table1_row.fk_table3}));
}

These changes are reflected in the files clientApp/src/Main.cpp, clientApp/src/protectedTables/Table1.cpp, and the corresponding headers.

Now all is ready to use the updated table as you see fit, or to begin the live migration process as described in Section A.7.

A.4. Deleting a sensitive column

Removing a sensitve columns requires the smallest amount of changes, as it only requires removing the code relevant to that specific column. In essence we are doing the exact opposite of Section A.3. In this recipe we will remove the column sensitive_data2 from the table table2.

A.4.1. Database side

Only a single line has to be removed on the database side, leaving the schema as:

CREATE TABLE IF NOT EXISTS table2
(
    pk UUID PRIMARY KEY,
    fk_table1 UUID NOT NULL,
    regular_data CHAR(100) NOT NULL,
-   sensitive_data2 CHAR(136) NOT NULL,
    sensitive_data3 CHAR(56) NOT NULL,
    CONSTRAINT table2_fkey_table1 FOREIGN KEY (fk_table1)
        REFERENCES table1 (pk) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION
);

The change is reflected in file DB/init.sql

A.4.2. Enclave side

Similarly on the enclave side we remove the lines mentioning sensitive_data2. First remove the column definition:

    // Table2
    //COLUMN(pk);
    COLUMN(fk_table1);
    COLUMN(regular_data1);
-   COLUMN(sensitive_data2);
    COLUMN(sensitive_data3);

And then from the schema:

MetaData metaData{{"table2"}};
using TS = TableSchema<
        Public<UUID_SIZE, pk>,
        Public<UUID_SIZE, fk_table1>,
        Public<100, regular_data1>,
-       Private<64 + EH_SIZE, sensitive_data2, AAD<tableName, AllPublicColumns>>,
        Private<4 + EH_SIZE, sensitive_data3, AAD<tableName, pk, regular_data1>>
        >;

The changes are reflected in files tasks/src/writeEnclave/writeEnclave.cpp and tasks/src/readEnclave/readEnclave.cpp

A.4.3. Webserver side

Similarly again for all components in the web server. Note that the size and structure of the table has changed, so care must be taken to properly parse the bytestreams.

type Table2 = {
    pk: string;                         // size: 36 bytes, unencrypted.
    fk_table1: string;                  // size: 36 bytes, unencrypted.
    regular_data1: string;              // size: 100 bytes, unencrypted.
-   sensitive_data2: Uint8Array;        // size: 64 bytes, encrypted.
    sensitive_data3: Uint8Array;        // size: 44 bytes, encrypted.
}

function getTable2(session: HI.Session, request_info: ...) {
    // ...
    // Serialize the data into a binary, rectangular format.
    const bufs: Buffer[] = [];
    for (const row of inTable2) {
        bufs.push(Buffer.from(row.pk));
        bufs.push(Buffer.from(row.fk_table1));
        bufs.push(Buffer.from(row.regular_data1));
-       bufs.push(Buffer.from(row.sensitive_data2));
        bufs.push(Buffer.from(row.sensitive_data3));
    }
    // ...
    // ...
    // Transform bytes back into js objects/json
    const outTable2: Table2[] = [];
    for (let i = 0; i < bytes.length; i += TABLE2_TOTAL_BYTELENGTH) {
        outTable2.push({
            pk:                 bytes.toString('utf8', i +   0, i +  36),
            fk_table1:          bytes.toString('utf8', i +  36, i +  72),
            regular_data1:      bytes.toString('utf8', i +  72, i + 172),
-           sensitive_data2:    bytes.subarray(        i + 172, i + 263),
-           sensitive_data3:    bytes.subarray(        i + 263, i + 280)
+           sensitive_data3:    bytes.subarray(        i + 172, i + 216)
        });
    }
    return outTable2;
}

These changes are reflected in the files webServer/src/router.ts, webServer/src/queries.ts, and webServer/src/protectedTables/Table2.ts.

A.4.4. Client side

Finally we remove the encryption and decryption operations in the client application. First in the write and encrypt direction:

auto const table2_row = Table2 {
    .pk = "...",                //  36 bytes, unencrypted
    .fk_table1 = "...",         //  36 bytes, unencrypted
    .regular_data1 = "...";     // 100 bytes, unencrypted.
-   .sensitive_data2 = "",      //  64 bytes, to be encrypted
    .sensitive_data3 = 0,       //   4 bytes, to be encrypted
};

-auto encrypted_field2 = encryptFieldB64(
-       writeKey,
-       writeKeyId,
-       table2_row.sensitive_data2,
-       table2_row.sensitive_data2_size,
-       AADtoBytes({"sensitive_data2", "table2", table2_row.pk, table2_row.fk_table1, table2_row.regular_data1}));
auto encrypted_field3 = encryptFieldB64(
        writeKey,
        writeKeyId,
        table2_row.sensitive_data3,
        AADtoBytes({"sensitive_data3", "table2", table2_row.pk, table2_row.regular_data1}));

Json rowJson;
rowJson["pk"] = table2_row.id;
rowJson["fk_table1"] = table2_row.fk_table1;
rowJson["regular_data1"] = table2_row.regular_data1;
-rowJson["sensitive_data2"] = encrypted_field2;
rowJson["sensitive_data3"] = encrypted_field3;

And in the read and decrypt direction:

// ...
for (auto const & rowJson : table2Json) {
    auto & table2_row = result.emplace_back(Table2{
         .pk = rowJson["pk"],
         .fk_users = rowJson["fk_table1"],
         .fk_users = rowJson["regular_data1"],
-        .sensitive_data2 = {},
         .sensitive_data3 = {},
    });
    // ...
-    table2_row.sensitive_data2 = decryptFieldB64<decltype(table2_row.sensitive_data2)>(
-        readKey,
-        table2JSON["sensitive_data2"].as_string(),
-        AADtoBytes({"sensitive_data2", "table2", table2_row.pk, table2_row.fk_table1, table2_row.regular_data1}));

    table2_row.sensitive_data3 = decryptFieldB64<decltype(table2_row.sensitive_data3)>(
        readKey,
        table2JSON["sensitive_data3"].as_string(),
        AADtoBytes({"sensitive_data3", "table2", table2_row.pk, table2_row.regular_data1}));
}

These changes are reflected in the files clientApp/src/Main.cpp and clientApp/src/protectedTables/Table2.cpp, and the corresponding headers.

Now all is ready to use the updated table as you see fit, or to begin the live migration process as described in Section A.7.

A.5. Deleting a regular column

Deleting a column that was not a part of any AAD is similar to deleting a sensitive column, however if the column was used as part of some AAD, the situation is similar to adding a new regular column. If any previous data needs to be persisted, the whole table needs to be re-encrypted and migrated as described in Section A.7. This section will assume that no previous data is persisted and the system can be fully reset.

The example will walk through removing the column regular_data1 from table2.

A.5.1. Database side

Only a single line has to be removed on the database side:

CREATE TABLE IF NOT EXISTS table2
(
    pk UUID PRIMARY KEY,
    fk_table1 UUID NOT NULL,
-   regular_data CHAR(100) NOT NULL,
    sensitive_data3 CHAR(56) NOT NULL,
    CONSTRAINT table2_fkey_table1 FOREIGN KEY (fk_table1)
        REFERENCES table1 (pk) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION
);

The change is reflected in file DB/init.sql

A.5.2. Enclave side

Similarly to removing a regular column, we first remove the column definition:

    // Table2
    //COLUMN(pk);
    COLUMN(fk_table1);
-   COLUMN(regular_data1);
    COLUMN(sensitive_data3);

And then remove the column from the schema and update the AAD of sensitive_data3:

MetaData metaData{{"table2"}};
using TS = TableSchema<
        Public<UUID_SIZE, pk>,
        Public<UUID_SIZE, fk_table1>,
-       Public<100, regular_data1>,
-       Private<4 + EH_SIZE, sensitive_data3, AAD<tableName, pk, regular_data1>>
+       Private<4 + EH_SIZE, sensitive_data3, AAD<tableName, pk>>
        >;

The changes are reflected in files tasks/src/writeEnclave/writeEnclave.cpp and tasks/src/readEnclave/readEnclave.cpp

A.5.3. Webserver side

Similarly to removing a sensitive column must update all components of the web server. Note that the size and structure of the table has changed, so care must be taken to properly parse the bytestreams.

type Table2 = {
    pk: string;                         // size: 36 bytes, unencrypted.
    fk_table1: string;                  // size: 36 bytes, unencrypted.
-   regular_data1: string;              // size: 100 bytes, unencrypted.
    sensitive_data3: Uint8Array;        // size: 44 bytes, encrypted.
}

function getTable2(session: HI.Session, request_info: ...) {
    // ...
    // Serialize the data into a binary, rectangular format.
    const bufs: Buffer[] = [];
    for (const row of inTable2) {
        bufs.push(Buffer.from(row.pk));
        bufs.push(Buffer.from(row.fk_table1));
-       bufs.push(Buffer.from(row.regular_data1));
        bufs.push(Buffer.from(row.sensitive_data3));
    }
    // ...
    // ...
    // Transform bytes back into js objects/json
    const outTable2: Table2[] = [];
    for (let i = 0; i < bytes.length; i += TABLE2_TOTAL_BYTELENGTH) {
        outTable2.push({
            pk:                 bytes.toString('utf8', i +   0, i +  36),
            fk_table1:          bytes.toString('utf8', i +  36, i +  72),
-           regular_data1:      bytes.toString('utf8', i +  72, i + 172),
-           sensitive_data3:    bytes.subarray(        i + 172, i + 216)
+           sensitive_data3:    bytes.subarray(        i + 72, i + 116)
        });
    }
    return outTable2;
}

These changes are reflected in the files webServer/src/router.ts, webServer/src/queries.ts, and webServer/src/protectedTables/Table2.ts.

A.5.4. Client side

Finally we update the ADD in the encryption and decryption operations in the client application. First in the write and encrypt direction:

auto const table2_row = Table2 {
    .pk = "...",                //  36 bytes, unencrypted
    .fk_table1 = "...",         //  36 bytes, unencrypted
-    .regular_data1 = "...";     // 100 bytes, unencrypted.
    .sensitive_data3 = 0,       //   4 bytes, to be encrypted
};

auto encrypted_field3 = encryptFieldB64(
        writeKey,
        writeKeyId,
        table2_row.sensitive_data3,
-       AADtoBytes({"sensitive_data3", "table2", table2_row.pk, table2_row.regular_data1}));
+       AADtoBytes({"sensitive_data3", "table2", table2_row.pk}));


Json rowJson;
rowJson["pk"] = table2_row.id;
rowJson["fk_table1"] = table2_row.fk_table1;
-rowJson["regular_data1"] = table2_row.regular_data1;
rowJson["sensitive_data3"] = encrypted_field3;

And in the read and decrypt direction:

for (auto const & rowJson : table2Json) {
    auto & table2_row = result.emplace_back(Table2{
         .pk = rowJson["pk"],
         .fk_users = rowJson["fk_table1"],
         .sensitive_data3 = {},
    });
    // ...
    table2_row.sensitive_data3 = decryptFieldB64<decltype(table2_row.sensitive_data3)>(
        readKey,
        table2JSON["sensitive_data3"].as_string(),
-       AADtoBytes({"sensitive_data3", "table2", table2_row.pk, table2_row.regular_data1}));
+       AADtoBytes({"sensitive_data3", "table2", table2_row.pk}));
}

These changes are reflected in the files clientApp/src/Main.cpp and clientApp/src/protectedTables/Table2.cpp, and the corresponding headers.

Now all is ready to use the updated table as you see fit, or to begin the live migration process as described in Section A.7.

A.6. Advanced - Adding new data processing

Adding new data processing operations can be done in two ways depending on the use case. In the simpler case, additional routes can be added to the existing batch processing enclave, similarly to the routes in the read and write enclaves, where different processing operations can be performed. In the more complex cases a new enclave might be needed. As the new enclave would not have access to old storage keys, a new key must be generated (using the key generation enclave) after adding the new processing enclave and the whole database must be re-encrypted, such that everything would be re-encrypted with the newly generated storage key. The re-encryption process is further described in Section A.7.

In this section we will add a new route to the batch processing enclave which will sum all values of sensitive_data4 in table3.

A.6.1. Database side

No changes are needed in the database.

A.6.2. Enclave side

The largest changes happen in the batch processing enclave, where we first have to add the possibility to for different routes, and then add the new route.

Adding routing to the enclave consists of two steps, adding the route parameter to user provided arguments list, and using the argument to select a route. The task argument list is defined in tasks/src/context/Context.h in the BatchProcessingContext class. This is also where new custom input parameters can be defined:

        HI_ARG(std::string, input_file);
        HI_ARG(std::string, output_file);
        HI_ARG(std::uint64_t, output_key_id);
+       HI_ARG(std::string, route);
        HI_ARG(std::string, permission_data);
        HI_ARG(std::string, admin_signature_1);
        HI_ARG(std::string, admin_signature_2);

        // Example additional user defined argument to modify the output of the report.
        HI_ARG(std::int64_t, customInputParameter);

Now we can use the route argument to add the routing for the existing processing in tasks/src/batchProcessingEnclave/batchProcessingEnclave.cpp:

void run(TaskInputs const & inputs, TaskOutputs &) {

    // ...

    // #16
    TODO_doSomethingWithPermissions(ctx.permissions);
+   if (ctx.args.route == "route1") {
        /* #17 #18 Set-up table schema */
        MetaData inputTableName{{"table1"}};

        // ...
        // Existing processing code is here
        // ...

        ctx.buildOutput<TS_OUT>(outputTableName, outputFn, aggregation_map);
+   }
}

Adding a new route consists of five steps: defining schemas for the new input and output tables; defining any intermediate result structures; defining the processing; defining the conversion from the intermediate structure to the output table; invoking the processing.

The example uses table3 as an input, and outputs a table with a single row and column. As the processing does not use any custom parameters we do not need to output the hash of the parameter. As such the result will not have any public columns and the AAD of the resukt will consist of only the table and column names.

Note that the signatures of the buildOutput and outputFn functions (defined in tasks/src/context/Context.h) are made specifically for the route1 use case and use a map as the intermediate structure to store the results. For simplicity we will not modify the function signatures or how the buildOutput function processes the output rows. Instead we will format the result as a map with a single entry to comply with existing function signatures. For more complex use cases it is also possible to define a new buildOutput function for each new use case by adding new structures in tasks/src/context/Context.h and tasks/src/context/Context_impl.h.

// ...
if (ctx.args.route == "route1") {
    // ...
} else if (ctx.args.route == "route2") {
    // Define input and output tables
    MetaData inputTableName{{"table3"}};
    using TS_IN = TableSchema<
            Public<UUID_SIZE, pk>,
            Private<4 + EH_SIZE, sensitive_data4, AAD<tableName, pk>>>;
    using Row_IN = TS_IN::DecryptedRow;

    MetaData outputTableName{{"processing_results"}};
    using TS_OUT =
            TableSchema<Private<4 + EH_SIZE, result, AAD<tableName>>>;
    using Row_OUT = TS_OUT::DecryptedRow;

    // Define intermediate result structures
    using MAP = std::map<uint8_t, int32_t>;
    using MAP_VALUES = const std::pair<std::remove_const_t<MAP::value_type::first_type>,
                                       std::remove_const_t<MAP::value_type::second_type>>;
    MAP result_as_map{};
    result_as_map[0] = 0;

    // This function reads the input table and performs the processing
    auto processingFn = [&](Row_IN const & row) {
        result_as_map[0] += bit_cast<int32_t>(row.sensitive_data4);
    };
    // This function transforms the intermediate structure to match the output schema
    auto outputFn = [&](MAP_VALUES & row) {
        Row_OUT rowOUT = {};
        // We disregard the key as we only need the value in this use case.
        rowOUT.result = bit_cast<decltype(rowOUT.result)>(row.second);
        return rowOUT;
    };

    // Start the processing
    ctx.process<TS_IN>(inputTableName, processingFn);
    ctx.buildOutput<TS_OUT>(outputTableName, outputFn, result_as_map);
}

Now the new processing has been added to the enclave and all that is left is to provide the required tables and invoke the enclave.

A.6.3. Webserver side

Adding the new batch processing route can be done in various ways depending on your use case. For example whole new API endpoints, functions and classes can be added to accomodate the change, or the existing ones could be modified to change the route chosen based on user input. Regardless how you implement the new route in your webserver, some operations are common in all approaches. First of all you have to make sure you provide the table corresponding to the selected route (in the example Table1 for route1, and Table3 for route2) and correctly parse the output table (the output of route2 consisted of a single column whereas the output of route1 had three columns). You also have to provide the route as a TaskArgument to the enclave invocation. The result should look somewhat analogous to this:

type ProcessingResult1 = {
    hash: string;                 // size: 32 bytes, unencrypted
    fk_users: string              // size: 36 bytes, unencrypted
    result: string                // size: 40 bytes, encrypted
}
type ProcessingResult2 = {
    result: string                // size: 40 bytes, encrypted
}

function batchProcess(session: HI.Session, request_info: ...) {
    const permission_info = query_db_permissions(request_info);

    const bufs: Buffer[] = [];
    if (request_info.route == "route1") {
        const table: Table1[] = query_db_table1(request_info);
        for (const row of table) {
            bufs.push(Buffer.from(row.pk));
            // ... as for other use cases.
        }
    } else if (request_info.route == "route2") {
        const table: Table3[] = query_db_table3(request_info);
        for (const row of table) {
            bufs.push(Buffer.from(row.pk));
            // ... as for other use cases.
        }
    } else {
        console log("Unknown route!");
        return;
    }

    writeFileSync(input_file, Buffer.concat(bufs));
    await session.taskRunSync(new mt.TaskRunRequest(
            new mt.ArbitraryName("batch_processing_enclave"),
            new mt.RequiredTrustedEnforcers([]),
            [
                new mt.TaskArgument("input_file", input_file),
                new mt.TaskArgument("output_file", output_file),
                new mt.TaskArgument("permission_data", permission_info[0]),
                new mt.TaskArgument("admin_signature_1", permission_info[1]),
                new mt.TaskArgument("admin_signature_2", permission_info[2]),
                new mt.TaskArgument("customInputParameter", request_info.customInputParameter),
                new mt.TaskArgument("route", request_info.route)
            ]));
    const bytes = readFileSync(output_file);

    if (request_info.route == "route1") {
        const outputs: ProcessingResult1[] = [];
        for (let i = 0; i < bytes.length; i += RESULT1_TOTAL_BYTELENGTH) {
            outputs.push({
                // ... as for other use cases.
            });
        }
        return outputs;
    } else if (request_info.route == "route2") {
        const outputs: ProcessingResult2[] = [];
        for (let i = 0; i < bytes.length; i += RESULT2_TOTAL_BYTELENGTH) {
            outputs.push({
                // ... as for other use cases.
            });
        }
        return outputs;
    }
}

These changes are reflected in the files webServer/src/router.ts, webServer/src/server.ts, webServer/src/HICommunication.ts, webServer/src/protectedTables/ProcessingResult1 and webServer/src/protectedTables/ProcessingResult2.

A.6.4. Client side

The client side remains similar to the existing batch processing calls, however care must be taken to parse the returned json according to the selected route, and note that the newly added route route2 does not have a hash column, and the single sensitive column only has the table and column names as it’s AAD.

const auto & outputJson = your_api.put("/batch_processing2/" + user_id + "/" + readKeyId);

ProcessingResult2 output{ .result = {} };

output.result = decryptFieldB64<decltype(outputJson.result)>(
        readKey,
        outputJson["result"].as_string(),
        AADtoBytes({"result", "processing_results"}));

These changes are reflected in the files clientApp/src/Main.cpp, clientApp/src/communication/ServerCommunication.cpp, clientApp/src/protectedTables/ProcessingResult1.cpp and clientApp/src/protectedTables/ProcessingResult2.cpp, and the corresponding headers.

The new batch processing is now inplace and can be used as needed.

A.7. Advanced - Re-encryption and migration in a live environment

There are three distinct cases where re-encryption and migration might be needed. In the order of increasing complexity:

A new enclave has been added, that needs to read previously encrypted data
A regular column (that is a part of some AAD) is added or removed
A new sensitive column is added

While the inner complexity of each case is slightly different, the general flow for all cases is the same. First the conversion between the old and new schemas has to be defined in the reencryption_enclave and new routes have to be added to support the updated schema after re-encryption. For the changes in the enclaves to take effect, the DFC upgrade procedure has to be completed. The migration process is initiated by first fetching all rows of the changing table from the database according to the old schema. Then the table can be re-encrypted and converted to match the new schema using the reencryption_enclave. Once the table has been converted and re-encrypted, the schema can be updated in the database and rest of the system and the re-encrypted table can be inserted into the database as required. Finally the re-encryption route and all references to the old schema can be removed from the enclaves and the system. Another DFC upgrade confirms and finalizes the changes.

Changing the schema of the tables naturally comes with a problem of preserving business continuity. As preserving business continuity is highly dependant on the use case and available tools, we will assume that no read or write operations are allowed during the re-encryption and migration process. To enusure that data is not lost during the migration process it is recommended to create new routes and structures for the updated table, and only remove the old routes from the web server and the initial table from the database once the migration is completed successfully.

It is important to note that in all cases, the re-encryption enclave does not require permission data or read/write key IDs. This is because the re-encryption enclave only uses storage keys and no person will recieve the plaintext output.

Additionally, as the re-encryption enclave is only required when the database schema changes or new enclaves are added, then the end user should not be able to initiate the enclave. The re-encryption should only be initiated by administrators during the migration process.

A.7.1. Adding a new enclave

The simplest re-encryption process happens when the DB schema does not change. In this case we only need to define the schemas of our tables in the reencryption_enclave, identically to adding routes in the read and write encalves, and invoke the enclave. For example, re-encryption of table3 requires defining the following route in the enclave:

}  else if (ctx.args.route == "table3_inplace") {
    MetaData metaData{{"table3"}};
    using TS = TableSchema<
            Public<UUID_SIZE, pk>,
            Private<4 + EH_SIZE, sensitive_data4, AAD<tableName, AllPublicColumns>>
            >;
    ctx.reencrypt<TS>(metaData);
}

Note that after adding a new route in the reencryption_enclave, the signature of the enclave changes and as such a DFC upgrade procedure has to be completed before the new procedure can be used.

Similarly, only small changes are required on the web server side, compared to the read and write flows:

function reencTable3(session: HI.Session, request_info: ...) {
    // Query the table from your DB
    const inTable3: Table3[] = query_db_table3(request_info);

    // Serialize as with other tables
    const bufs: Buffer[] = [];
    for (const row of inTable3) { ... }

    writeFileSync(input_file, Buffer.concat(bufs));
    await session.taskRunSync(new mt.TaskRunRequest(
            new mt.ArbitraryName("reencryption_enclave"),
            new mt.RequiredTrustedEnforcers([]),
            [
                new mt.TaskArgument("input_file", input_file),
                new mt.TaskArgument("output_file", output_file),
                new mt.TaskArgument("route", "route3")
            ]));
    const bytes = readFileSync(output_file);

    // Deserialize as for other tables and insert it back to the DB
    // ...
}

As no changes in the schema occured, there is no need to create any temporary structures in the database, change any of the other enclaves, or add any routes in the other components.

A.7.2. Adding a regular column

Adding a column that is a part of some AAD has two additional points of complexity when compared to simple re-encryption: converting the table to the new schema, and preserving business continuity during the migration process.

First both the input and output schemas have to be defined in the re-encryption enclave, along with a function converting the input to the output. Depending on the use case and options available to you, there are various ways to define the schema change in the re-encryption enclave. If the added column can be set to some fixed value for all tables (e.g. a null value), then the conversion function can be hardcoded to set the appropriate value for each row:

} else if (ctx.args.route == "table1_add_regular") {
    MetaData metaData{{"table1"}};
    // If the default value for the new column can be set to a fixed value,
    // it can be hardcoded in the conversion function
    using TS_IN = TableSchema<
            Public<UUID_SIZE, pk>,
            Public<UUID_SIZE, fk_users>,
            Private<4 + EH_SIZE, sensitive_data1, AAD<tableName, AllPublicColumns>>
            >;
    using TS_OUT = TableSchema<
            Public<UUID_SIZE, pk>,
            Public<UUID_SIZE, fk_users>,
            Public<UUID_SIZE, fk_table3>,
            Private<4 + EH_SIZE, sensitive_data1, AAD<tableName, AllPublicColumns>>
            >;
    auto conversionFn = [&](TS_IN::DecryptedRow & rowIN) {
        TS_OUT::DecryptedRow rowOUT = {};
        rowOUT.pk = rowIN.pk;
        rowOUT.fk_users = rowIN.fk_users;
        rowOUT.sensitive_data1 = rowIN.sensitive_data1;
        rowOUT.fk_table3 = "SOME_CONSTANT_FIXED_VALUE           ";
        return rowOUT;
    };
    ctx.reencrypt<TS_IN, TS_OUT>(metaData);
}

Alternatively, as the column is a public value, it is possible to first change the schema of the table and insert the desired values to the table externally, before providing it to the enclave, and in the enclave only change the AAD of the input and output schemas:

} else if (ctx.args.route == "table1_add_regular") {
    MetaData metaData{{"table1"}};
    // If the default value for the new column can be provided externally,
    // then the input table should already contain the new regular column
    // with the correct new value, but it should not be a part of the AAD
    using TS_IN = TableSchema<
            Public<UUID_SIZE, pk>,
            Public<UUID_SIZE, fk_users>,
            Public<UUID_SIZE, fk_table3>,
            Private<4 + EH_SIZE, sensitive_data1, AAD<tableName, pk, fk_users>>
            >;
    using TS_OUT = TableSchema<
            Public<UUID_SIZE, pk>,
            Public<UUID_SIZE, fk_users>,
            Public<UUID_SIZE, fk_table3>,
            Private<4 + EH_SIZE, sensitive_data1, AAD<tableName, AllPublicColumns>>
            >;
    auto conversionFn = [&](TS_IN::DecryptedRow & rowIN) {
        // As the plaintext of the schemas are identical we can simply copy the fields
        TS_OUT::DecryptedRow rowOUT = {};
        rowOUT.pk = rowIN.pk;
        rowOUT.fk_users = rowIN.fk_users;
        rowOUT.sensitive_data1 = rowIN.sensitive_data1;
        rowOUT.fk_table3 = rowIN.fk_table3;
        return rowOUT;
    };
    ctx.reencrypt<TS_IN, TS_OUT>(metaData);
}

The web server side remains similar to the regular re-encryption. Using the second approach, we only need to implement the function that inserts values in the new column:

function addRegularToTable1(session: HI.Session, request_info: ...) {
    // Query the table with the old schema from DB
    const table1: Table1[] = query_db_table1(request_info);

    // Optionally insert the new column with some row based public values
    const inTable1 = addColumn(table1);

    // Serialize as with other tables
    const bufs: Buffer[] = [];
    for (const row of inTable1) { ... }

    writeFileSync(input_file, Buffer.concat(bufs));
    await session.taskRunSync(new mt.TaskRunRequest(
            new mt.ArbitraryName("reencryption_enclave"),
            new mt.RequiredTrustedEnforcers([]),
            [
                new mt.TaskArgument("input_file", input_file),
                new mt.TaskArgument("output_file", output_file),
                new mt.TaskArgument("route", "table1_add_regular")
            ]));
    const bytes = readFileSync(output_file);

    // Deserialize as for other tables and insert it back to the DB with the new schema
    // ...
}

Once the table has been re-encrypted and inserted to the database, the re-enryption route and all references to the old schema can be removed. Note that changing the enclaves requires performing the DFC upgrade procedure again.

A.7.3. Removing a regular column

Removing a regular column is analogous to adding a column. The only change is in the conversion function in the re-encryption enclave, where the removed row will be ignored:

else if (ctx.args.route == "table2_remove_regular") {
    MetaData metaData{{"table2"}};
    using TS_IN = TableSchema<
        Public<UUID_SIZE, pk>,
        Public<UUID_SIZE, fk_table1>,
        Public<UUID_SIZE, regular_data1>,
        Private<4 + EH_SIZE, sensitive_data3, AAD<tableName, pk, regular_data1>>>;
    using TS_OUT = TableSchema<
        Public<UUID_SIZE, pk>,
        Public<UUID_SIZE, fk_table1>,
        Private<4 + EH_SIZE, sensitive_data3, AAD<tableName, pk>>>;
    auto conversionFn = [&](TS_IN::DecryptedRow &rowIN) {
        // Exisiting columns can be copied
        TS_OUT::DecryptedRow rowOUT = {};
        rowOUT.pk = rowIN.pk;
        rowOUT.fk_table1 = rowIN.fk_table1;
        rowOUT.sensitive_data3 = rowIN.sensitive_data3;
        // The removed column is ignored
        return rowOUT;
    };
    ctx.reencrypt<TS_IN, TS_OUT>(metaData, conversionFn);
}

Rest of the code is identical to previous use cases.

A.7.4. Adding a sensitive column

While similar to adding a regular column, adding a sensitive column is more restricted in it’s format, as default or null values can not be provided externally. In the simplest case, the field can be filled with some constant fixed value:

} else if (ctx.args.route == "table1_add_sensitive") {
    MetaData metaData{{"table1"}};
    // Sensitive columns can not be provided externally,
    // so they have to be set to some constant default
    using TS_IN = TableSchema<
            Public<UUID_SIZE, pk>,
            Public<UUID_SIZE, fk_users>,
            Public<UUID_SIZE, fk_table3>,
            Private<4 + EH_SIZE, sensitive_data1, AAD<tableName, AllPublicColumns>>
            >;
    using TS_OUT = TableSchema<
            Public<UUID_SIZE, pk>,
            Public<UUID_SIZE, fk_users>,
            Public<UUID_SIZE, fk_table3>,
            Private<4 + EH_SIZE, sensitive_data1, AAD<tableName, AllPublicColumns>>,
            Private<10 + EH_SIZE, sensitive_data5, AAD<tableName, pk, fk_table3>>
            >;

    auto conversionFn = [&](TS_IN::DecryptedRow & rowIN) {
        // Existing columns can be copied
        TS_OUT::DecryptedRow rowOUT = {};
        rowOUT.pk = rowIN.pk;
        rowOUT.fk_users = rowIN.fk_users;
        rowOUT.sensitive_data1 = rowIN.sensitive_data1;
        rowOUT.fk_table3 = rowIN.fk_table3;
        // The added column is set to some default value
        rowOUT.sensitive_data5 = bit_cast<decltype(rowOUT.sensitive_data5)>("NULL      ");
        return rowOUT;
    };
    ctx.reencrypt<TS_IN, TS_OUT>(metaData);
}

Alternatively, it is possible to provide row based inputs by adding a temporary public column to the input schema, that will be used in the conversion function:

} else if (ctx.args.route == "table1_add_sensitive") {
    MetaData metaData{{"table1"}};
    using TS_IN = TableSchema<
            Public<UUID_SIZE, pk>,
            Public<UUID_SIZE, fk_users>,
            Public<UUID_SIZE, fk_table3>,
            Public<10, temporary>,
            Private<4 + EH_SIZE, sensitive_data1, AAD<tableName, pk, fk_users, fk_table3>>
            >;
    using TS_OUT = TableSchema<
            Public<UUID_SIZE, pk>,
            Public<UUID_SIZE, fk_users>,
            Public<UUID_SIZE, fk_table3>,
            Private<4 + EH_SIZE, sensitive_data1, AAD<tableName, AllPublicColumns>>,
            Private<10 + EH_SIZE, sensitive_data5, AAD<tableName, pk, fk_table3>>
            >;

    auto conversionFn = [&](TS_IN::DecryptedRow & rowIN) {
        // Existing columns can be copied
        TS_OUT::DecryptedRow rowOUT = {};
        rowOUT.pk = rowIN.pk;
        rowOUT.fk_users = rowIN.fk_users;
        rowOUT.sensitive_data1 = rowIN.sensitive_data1;
        rowOUT.fk_table3 = rowIN.fk_table3;
        // The added column is taken from the temporary column
        rowOUT.sensitive_data5 = bit_cast<decltype(rowOUT.sensitive_data5)>(rowIN.temporary);
        return rowOUT;
    };
    ctx.reencrypt<TS_IN, TS_OUT>(metaData);
}

Rest of the code and operations are identical to previous use cases.

A.8. Changing admins/admin_signatures

Changing the certificates of the stakeholders admin1 or admin2, whether due to changes in your organizational structure, regular key rotation, or any other reason, requires performing the DFC upgrade procedure, so that the new certificate would replace the one listed in the DFC.

Additionally, as the enclaves verify that the permission_string of an end-user is signed by both of the admin stakeholders, the signature of the changed stakeholder has to be updated accordingly for each end-entity’s permission_string. This means that the changed stakeholder has to sign all of the permission_strings in the user credential database and the corresponding signature field has to be updated with the new signature. The signing procedure is described in Section 5.4.4 and can be automated.

Finally, if admin1 changes (the stakeholder that assigns the end-user certificates), all of certificates of end-users become invalid. As such each end-user needs to be assigned a new certificates by the updated admin1 stakeholder. This process can also be automated by signing the original CSRs (as described in Section 5.4.2) of all users and updating certificate fields in the user credential database.

A.9. Updating the structure of the `permission_string`

As the structure of the permission_string is not defined or enforced by the PEDB architecture, you are free to change or modify the structure as you see fit. There are, however, some key points to keep in mind when doing so. Firstly, if the permission_string of any user changes, then the resulting permission_string has to be signed by both admin stakeholders and the signature fields in the user credential database have to be updated accordingly. Additionally, if you have implemented any additional access controls inside the enclaves based on the permission_string (namely if you have implemented the TODO_doSomethingWithPermissions function in the batchProcessingEnclave, readEnclave, or writeEnclave), you will need to update the function to reflect the new format.

Appendix B: Restrictions

The PEDB architecture is not compatible with all use-cases and requirements. This section will highlight some restrictions and limitations of the solution.

B.1. Vulnerabilities

You can read more about the vulnerabilities of Intel® SGX from the Cybernetica research report: An Overview of Vulnerabilities and Mitigations of Intel SGX Applications.

B.2. Joining on sensitive data

Normal database joins can not be done on sensitive values, as each ciphertext is unique, even when the underlying plaintext values are identical. However, public values can be used to join tables using normal database operations, and in most cases the primary key does not have to be encrypted.

B.3. GPU processing

It is currently not possible to create enclaves that utilize the GPU, as Intel® SGX only works using Intel® processors. This severely limits some forms of processing that require the extra performance of GPUs, most notably machine learning workflows.

B.4. Multi threaded processing

Sharemind HI currently works only as single threaded application, which can limit the performance of applications. While multi-threading is allowed within Intel® SGX, using multiple threads inside enclaves brings a number of security concerns, which are also outlined in the vulnerability research report mentioned in Section B.1.

It is, however, possible to run different enclaves in parallel by duplicating the enclave under different names. This allows multiple instances of the same workflow to run in parallel.

B.5. Multiple connected instances of Sharemind HI

Each instance of Sharemind HI generates its own data encryption keys, that are not accessible to other instances of Sharemind HI, even if the instances exist on the same physical machine. Even if the PEDB is shared between the instances, neither would be able to decrypt values encrypted by the other instance, as they can not access the storage keys used by the other instance.

Sharing data between Sharemind HI instances is on the roadmap and may become a part of future releases.

1. Patent Pending

2. This algorithm is chosen as it underlies the whole Intel® SGX security, but any other Authenticated Encryption with Associated Data (AEAD) scheme works, too.

3. CREATE TABLE keys (user_id UUID PRIMARY KEY, encr_key_PEM VARCHAR, cert_PEM VARCHAR)