Sharemind HI Overview
Sharemind HI is the right tool if you need to analyse data from multiple parties in a central place and keep the risk of a data breach to a minimum. Sharemind HI is a development platform for confidential analysis of data from multiple parties on a centralized server with full control over exposing the data and results to others, leveraging the Intel® Software Guard Extensions with its strong cryptographic protection. It gives data owners a way to remotely ensure that their data is used only in a way they agree upon.
Intel® SGX is at the core of Sharemind HI. It provides all means to protect the data in transit, during processing and at rest. Sharemind HI hides most of the involved cryptography and allows the developer to focus on the functionality inside the task enclaves and on the end-user side.
Sharemind HI achieves privacy by using the Intel® SGX Trusted Execution Environment (TEE) technology.
Typical ways of how Sharemind HI is used in a solution are the following:
- Batch Analysis
One or many input providers upload data. The task is invoked to perform the analytics. When the task is finished, possibly after a longer time, an output consumer downloads the results. This could be done to perform some one-off analysis, or periodically ingest pseudonymised data and create reports over the de-pseudonymised data.
- Outsourcing Heavy Computations
Allow users to outsource some heavy computation on confidential data in a cloud environment. For example training machine learning models based on an individual user’s data, without sharing the data or trained models with other users.
The way how access control is realised in Sharemind HI is by compartmentalizing data into topics and moving algorithms into tasks (task enclaves or upcoming task VMs), and describing in the Dataflow Configuration which user or task can read from or write to what topic. This can be imagined as a graph like the following example:
A Sharemind HI solution consists of a set of ready-made, generic Sharemind HI components and new solution specific functionality. The latter comprises the software which interacts with the Sharemind HI Server, and the actual business logic running inside of the Sharemind HI Server:
- The project specific application/website/service
This component wraps the Sharemind HI Client to upload or download confidential data, invoke task enclaves and more. There might be the need to develop different variants of this component to match the needs of each stakeholder and their roles.
The Sharemind HI Client is available for multiple platforms as a C++ library and a TypeScript library, as well as a Linux CLI.
- The project specific tasks
These enclaves or VMs contain the solution specific business logic. They can access the raw confidential data which was uploaded by data providers. Due to Intel® SGX the raw confidential data is protected from any other user or software running on the same machine, even while the data is actively processed by the task enclaves.
Intel® SGX enclaves provide only a restricted environment with a minimal attack surface, but most functionality needs to be either ported to the enclave environment or written from scratch. With VMs, code reuse is much easier, but the amount of software inside of the VM is much larger and thus also the attack surface.
We provide separate development and production builds. The "Hello World!" of Sharemind HI development is this Batch Analysis style project. It is as easy as performing the following steps:
# Create a CMake-based project template.
$ sharemind-hi-create-task-enclave-project --output /tep --…
# Edit the relevant files.
$ nvim /tep/test/dataflow-configuration.yaml
$ nvim /tep/src/enclave/Enclave.cpp
$ nvim /tep/test/client.sh
# Build and run the tests.
$ cd /tep/build
$ make -j4
$ ctest -V
When the task enclave code is ready, there are a couple of steps required to get it running in a production environment:
Sending stakeholder certificates to other stakeholders and end users.
Creating a release build of enclaves.
An auditor verifies the task enclave code and dataflow configuration adhere to the specification.
Distributing the trusted enclave fingerprints to stakeholders and end users (as well as client-side software).
Creating the server configuration.
Enforcers need to approve the Sharemind HI Server and the DFC.
Note that any changes to the task enclave code requires you to repeat most of the steps above and perform a dataflow configuration upgrade.
# Create the server configuration
# Start the Sharemind HI Server. We also provide a systemd service file.
sharemind-hi-server -c server.yaml
When the Sharemind HI Server starts its life-cycle, the administrator can perform the following operations:
- Data Backup
The data backup procedure consists of two bash commands.
- Dataflow Configuration Upgrade
Whenever you need to modify the dataflow configuration, you need to repeat the relevant steps of approving the solution.
- Recovery From Backup
Due to the use of Intel® SGX, for normal operations the Sharemind HI Server is tied to the same CPU for its whole lifetime. However, there is a manual recovery process which allows you to use the state of an "old" Sharemind HI Server on a new CPU.
Note: As of now, Sharemind HI has not implemented any cold or hot stand-by feature, though this might be added in the future.
- Inspecting the Audit Log
The Sharemind HI Server produces an encrypted, tamper-proof audit log which can be decrypted by stakeholders with the Auditor role. An additional procedure should be put in place to store the audit logs in some existing log-keeping service from where auditors can access it.