Sharemind HI Overview

1. Introduction

Sharemind HI is the right tool if you need to analyse data from multiple parties in a central place and keep the risk of a data breach to a minimum. Sharemind HI is a development platform for confidential analysis of data from multiple parties on a centralized server with full control over exposing the data and results to others, leveraging the Intel® Software Guard Extensions with its strong cryptographic protection. It gives data owners a way to remotely ensure that their data is used only in a way they agree upon.

components overview
Figure 1. Sharemind HI uses special support by the hardware to technically enforce data protection, even against root users.

Intel® SGX is at the core of Sharemind HI. It provides all means to protect the data in transit, during processing and at rest. Sharemind HI hides most of the involved cryptography and allows the developer to focus on the functionality inside the task enclaves and on the end-user side.

Sharemind HI achieves privacy by using the Intel® SGX Trusted Execution Environment (TEE) technology.

2. Typical Use Cases

Typical ways of how Sharemind HI is used in a solution are the following:

use cases
Figure 2. Sharemind HI can be used in different use cases. The displayed ones are common patterns which we encountered frequently.
Batch Analysis

One or many input providers upload data. The task is invoked to perform the analytics. When the task is finished, possibly after a longer time, an output consumer downloads the results. This could be done to perform some one-off analysis, or periodically ingest pseudonymised data and create reports over the de-pseudonymised data.

Outsourcing Heavy Computations

Allow users to outsource some heavy computation on confidential data in a cloud environment. For example training machine learning models based on an individual user’s data, without sharing the data or trained models with other users.

3. The Dataflow Configuration

The way how access control is realised in Sharemind HI is by compartmentalizing data into topics and moving algorithms into tasks (task enclaves or upcoming task VMs), and describing in the Dataflow Configuration which user or task can read from or write to what topic. This can be imagined as a graph like the following example:

dfc
Figure 3. Data between users and tasks moves through persistent topics, or ephemerally as request arguments. The Dataflow Configuration declaratively describes who can access which topic, who can invoke which task, and more.

4. Development

A Sharemind HI solution consists of a set of ready-made, generic Sharemind HI components and new solution specific functionality. The latter comprises the software which interacts with the Sharemind HI Server, and the actual business logic running inside of the Sharemind HI Server:

The project specific application/website/service

This component wraps the Sharemind HI Client to upload or download confidential data, invoke task enclaves and more. There might be the need to develop different variants of this component to match the needs of each stakeholder and their roles.
The Sharemind HI Client is available for multiple platforms as a C++ library and a TypeScript library, as well as a Linux CLI.

The project specific tasks

These enclaves or VMs[1] contain the solution specific business logic. They can access the raw confidential data which was uploaded by data providers. Due to Intel® SGX the raw confidential data is protected from any other user or software running on the same machine, even while the data is actively processed by the task enclaves.
Intel® SGX enclaves provide only a restricted environment with a minimal attack surface, but most functionality needs to be either ported to the enclave environment or written from scratch. With VMs, code reuse is much easier, but the amount of software inside of the VM is much larger and thus also the attack surface.

dev
Figure 4. A high level overview of what new software needs to be written to make use of Sharemind HI.

We provide separate development and production builds. The "Hello World!" of Sharemind HI development is this Batch Analysis style project. It is as easy as performing the following steps:

# Create a CMake-based project template.
$ sharemind-hi-create-task-enclave-project --output /tep --…
# Edit the relevant files.
$ nvim /tep/test/dataflow-configuration.yaml
$ nvim /tep/src/enclave/Enclave.cpp
$ nvim /tep/test/client.sh
# Build and run the tests.
$ cd /tep/build
$ make -j4
$ ctest -V

5. Deployment

deployment
Figure 5. Deploying a Sharemind HI project into a production environment requires a series of steps to ensure that end users interact with the trustworthy service. This communication diagram is slightly exaggerated, though.

When the task enclave code is ready, there are a couple of steps required to get it running in a production environment:

  1. Sending stakeholder certificates to other stakeholders and end users.

  2. Creating a release build of enclaves.

  3. An auditor verifies the task enclave code and dataflow configuration adhere to the specification.

  4. Distributing the trusted enclave fingerprints to stakeholders and end users (as well as client-side software).

  5. Creating the server configuration.

  6. Enforcers need to approve the Sharemind HI Server and the DFC.

Note that any changes to the task enclave code requires you to repeat most of the steps above and perform a dataflow configuration upgrade.

6. Operations

When you have your signed enclave files ready, you can configure and start the Sharemind HI Server:

# Create the server configuration
nvim server.yaml
# Start the Sharemind HI Server. We also provide a systemd service file.
sharemind-hi-server -c server.yaml

When the Sharemind HI Server starts its life-cycle, the administrator can perform the following operations:

Monitoring

The Sharemind HI Server exports metrics via OpenTelemetry. We prepared a docker-compose file which brings up a Grafana dashboard with no effort.

Data Backup

The data backup procedure consists of two bash commands.

Dataflow Configuration Upgrade

Whenever you need to modify the dataflow configuration, you need to repeat the relevant steps of approving the solution.

Recovery From Backup

Due to the use of Intel® SGX, for normal operations the Sharemind HI Server is tied to the same CPU for its whole lifetime. However, there is a manual recovery process which allows you to use the state of an "old" Sharemind HI Server on a new CPU.
Note: As of now, Sharemind HI has not implemented any cold or hot stand-by feature, though this might be added in the future.

Inspecting the Audit Log

The Sharemind HI Server produces an encrypted, tamper-proof audit log which can be decrypted by stakeholders with the Auditor role. An additional procedure should be put in place to store the audit logs in some existing log-keeping service from where auditors can access it.


1. Sharemind HI Task VMs are still in the design phase and not available, yet.