State & Files

1. Introduction

The Sharemind HI Server does not use a dedicated database but saves its state in regular files.

2. The Data Directory and Temp Directory

The Sharemind HI Server stores all its data in two directories:

Data Directory

Configured in the server.yaml file at Service.DataStore. The Sharemind HI Server expects that all data in this directory is persisted and—​with minor exceptions—​not modified by anyone. When this directory is backed up correctly then the Sharemind HI Server should always succeed to completely restore from the last successfully written state.
Currently Sharemind HI supports reading through the POSIX file system API or through HDFS. However, with FUSE helpers you can integrate other types, like S3 through goofys. Files are ever only read from start to finish, and written in one go. Random IO or file truncation is not performed by the Sharemind HI Server.

Temporary Directory

Configured in the server.yaml file at Service.TemporaryPath. This directory contains runtime data and can be cleared when restarting the Sharemind HI Server. However, except otherwise stated the files should still not be modified or deleted while the Sharemind HI Server is still running.
The Sharemind HI Server expects that this directory is a regular local file system. So using FUSE helpers like goofys might not work.

In the following texts, we use $DataDir and $TempDir as placeholders for the data directory and temporary directory, respectively.

3. Snapshot of the File System

To give you an idea of files which are stored in the file system, here is a (truncated) snapshot of the file system of a Sharemind HI Server with just a couple of actions done.

3.1. Data Directory

drwxr-xr-x    - └── data
drwxr-x---    -    ├── enclave-state
drwxr-x---    -    │  ├── 0
.rw-r--r-- 8.1k    │     ├── core-enclave
.rw-r--r--  940    │  │  ├── core-enclave.recovery-key-shares
.rw-r--r-- 4.1k    │  │  ├── key-enclave
.rw-r--r--  936    │  │  ├── key-enclave.recovery-key-shares
.rw-r--r--   52    │  │  ├── server-state
.rw-r--r--   72    │  │  └── state-save-number
drwxr-x---    -    │  └── 1
.rw-r--r-- 8.6k    │     ├── core-enclave
.rw-r--r--  940    │     ├── core-enclave.recovery-key-shares
.rw-r--r-- 4.4k    │     ├── key-enclave
.rw-r--r--  936    │     ├── key-enclave.recovery-key-shares
.rw-r--r--   52    │     ├── server-state
.rw-r--r--   72    │     └── state-save-number
drwxr-x---    -    ├── encrypted-data-blobs
drwx------    -    │  ├── 7C88E9581A34BE1DF0D61F4467ECBFDC
.rw-r--r--   48    │  │  ├── bin
.rw-r--r--  763    │  │  ├── meta
.rw-r--r--  908    │  │  └── receivers
drwx------    -    │  ├── 57725944FA2844E2D483ED5601829AD1
.rw-r--r--   64    │  │  ├── bin
.rw-r--r--  763    │  │  ├── meta
.rw-r--r--  908    │  │  └── receivers
drwxr-x---    -    │  ├── topic-728E1927DB4AAE59CC0D293DDBF35A78
drwxr-x---    -    │  └── topic-1813A462A2D19C10975C8D7DD5AD5DBC
.rw-r--r--   16    │     ├── 000000000000
.rw-r--r--   16    │     └── 000000000001
drwxr-x---    -    ├── key-encryption-keys
drwx------    -    │  ├── 728E1927DB4AAE59CC0D293DDBF35A78
drwx------    -    │  └── 1813A462A2D19C10975C8D7DD5AD5DBC
.rw-r--r--  734    │     ├── 000000000000
.rw-r--r--  734    │     └── 000000000001
drwxr-x---    -    └── task-enclave-images
drwxr-x---    -       ├── 937BF5025DC6772FD77DB4EB72605ED3FE98CF76805A28D0C79A735415DA6C4F-4B882D3013CA4865F713D8B35B2843F2C088C28285DD12C95522A8063DCEF28D
.rw-r--r--  22M       │  └── libsharemind_hi_test_task_enclave.signed.so
drwxr-x---    -       └── 937BF5025DC6772FD77DB4EB72605ED3FE98CF76805A28D0C79A735415DA6C4F-F89079D6B3C52601ABA0337498AF425936AF77F2222E81E330E34AE85579673A
.rw-r--r--  20M          └── libsharemind_hi_sample_task_enclave.signed.so

3.2. Temporary Directory

13: drwxr-xr-x    - └── temp
13: .rw-r--r-- 321k    ├── audit-log
13: drwxr-x---    -    ├── core-enclave
13: drwxr-xr-x    -    │  └── task-enclave-images
13: drwxr-xr-x    -    │     └── sample_task
13: drwxr-xr-x    -    │        └── 937BF5025DC6772FD77DB4EB72605ED3FE98CF76805A28D0C79A735415DA6C4F-F89079D6B3C52601ABA0337498AF425936AF77F2222E81E330E34AE85579673A
13: .rw-r--r--  20M    │           └── libsharemind_hi_sample_task_enclave.signed.so
13: .rw-r--r--    0    ├── enclave-state-saving.LCK
13: drwxr-x---    -    ├── key-enclave
13: .rw-r--r--    5    ├── server-admin.port
13: .rw-r--r--    5    ├── server-user.port
13: drwxr-x---    -    └── task-enclaves
13: drwxr-x---    -       ├── sample_task
13: drwxr-x---    -       └── test_task

4. $DataDir/

4.1. enclave-state/

The state of the enclaves is stored in two directories which are used like a double-buffer to ensure that there is always a non-corrupt state to load from. The state-save-number file is used to identify which of the two directories is newer and non-corrupt. It is written last and contains a monotonically increasing number and a checksum. During startup, the server compares which of the two directories contains the larger valid state save number and loads the state from there.

4.2. encrypted-data-blobs/

When a producer stores data in a topic, a new subdirectory within encrypted-data-blobs/ is created, the name being the $UntrustedFileSystemId. The bin file contains the actual ciphertext.

Per topic, the directory topic-$TopicInternalIdentifier/ stores files named after the $DataId (index within a given topic) whose content is the respective $UntrustedFileSystemId. This is used for data retention purposes. Currently the Sharemind HI Server does not perform garbage collection of orphaned $UntrustedFileSystemId/ directories which came from unfinished data uploading, though the $DataId → $UntrustedFileSystemId mapping could be used to identify such orphans.

4.3. key-encryption-keys/

Stores per $TopicInternalIdentifier/ all incoming key encryption keys in files named by a monotonically increasing counter. Again used for data retention purposes.

4.4. task-enclave-images/

Upon startup, the Sharemind HI Server copies all task enclaves, which are references from the server.yaml file, to this directory. This is used to avoid a situation arising during the DFC upgrade, where a task enclave file is overwritten by an updated one, whereas the Sharemind HI Server still uses the old task enclave.

Right now the Sharemind HI Server uses the $TempDir/task-encalves/ helper directory to actually start the enclaves.

5. $TempDir/

5.1. audit-log

5.1.1. Location & Format

Audit logs are written to the file $TempDir/audit-log where each written line has the following format[1]:

<UNIX timestamp in seconds> <hex encoded `AuditLogEntry` flatbuffer>

5.1.2. Persisting the Audit Log

The audit-log file is just one continuously growing file of audit log entries. You probably want to store the log entries at a more appropriate location and make them accessible to your auditors through your specific infrastructure.

The Sharemind HI Server opens the audit-log file at the beginning and keeps it opened during regular operations. In order for the admin to perform maintenance tasks on the audit-log file (e.g. using logrotate to write the audit log to some persistent storage), they need to use the sharemind-hi-admin tool to temporarily pause writing of audit log entries and close the audit-log file:

# Make sure that all entries are flushed to the file.
# Could also be part of the `prerotate` action for logrotate.
sharemind-hi-admin -c /path/to/server.yaml pause-audit-log-writing

# Your logic.
your-log-handling-tool-of-choice

# Re-open the file and resume writing of audit log entries.
# Could also be part of the `postrotate` action for logrotate.
sharemind-hi-admin -c /path/to/server.yaml resume-audit-log-writing

Note: Right now there is no synchronization between the writing of the enclave state to disk and the writing of the audit log to disk.

More details about the audit log entries themselves are in the Audit Log section.

5.2. core-enclave/

The core enclave creates copies of the task enclave files from the persistent storage. Otherwise, the core enclave does not create further temporary files.

5.3. key-enclave/

Temporary files created by the key enclave. As of now, the key enclave does not create temporary files.

5.4. task-enclave/

Temporary files created by the respective task enclaves. These are created e.g. as part of the Streams API.

5.5. enclave-state-saving.LCK

Helper file which is used for proper back-up of the $DataDir.

5.6. server-admin.port, server-user.port

These file contain the TCP port number where the gRPC server is listening for incoming connections. Although the port numbers can be configured in the Server and AdministratorServer sections of the server.yaml file, these files do have a practical purpose:

  • The Sharemind HI Server only creates the files after the gRPC server has been initialized, so you can wait until the files are created before connecting to the Sharemind HI Server.

  • If you configure port number 0, then the kernel assigns some random valid port number. In that case the assigned port numbers can be read from these files.

6. Saving the State

The state can be saved via three different ways:

Automatically through an inbuilt timer

The Sharemind HI Server saves the state periodically, as often as configured by the Service::StateSavePeriod option in the server.yaml file.

Manually through special topics

Triggering the state save through topic uploads can be used for transactional consistency with external services and data stores, driven by stakeholders.
A topic can be configured to trigger a state save whenever data is added to it, using the TriggerStateSaveOnUpload option for that topic in the dataflow configuration. Note that any added data to that topic only becomes visible after the state has been successfully saved. This means that a dataUpload action on the client side blocks until the state has been saved, and a task invoked by taskRun only finishes when the state has been saved. Further note that if a task instance is blocked by some topic A to write the state to the disk, but it also writes to another topic B which does not have the TriggerStateSaveOnUpload option set, then everyone else who writes to topic B is also blocked behind the state saving process. This is done to preserve the correct (observable) order in the topics.

Manually through the sharemind-hi-admin tool

Administrators might want to trigger a state save before they proceed to backup the state, as to get the most recent view of the data. Therefore they can use the sharemind-hi-admin tool which blocks until the state has been saved. Note that the periodic state saving timer is reset, i.e. the next automatically initiated state save only happens after StateSavePeriod time after the sharemind-hi-admin tool triggered the state save.

While saving, the enclave-state-saving.LCK file is locked to synchronise with external back-up procedures.

7. Example use of goofys

TODO the Architecture/ section does not seem to be the correct place for this section.

goofys allows to mount an S3 bucket as a POSIX-ish file system. This way the popular S3 storage can be used with Sharemind HI. Since goofys only allows a restricted set of file operations, we test Sharemind HI with goofys to make sure that we don’t use anything unsupported. Specifically, a read operation reads the full file, and a write operation writes the full file. There are no random reads or writes, and files are not truncated.

Testing Sharemind HI with an S3 bucket can be done e.g. as follows, using localstack:

Listing 1. docker-compose.yml file for starting localstack
version: '3.7'
services:
  localstack:
    image: localstack/localstack
    container_name: localstack_service
    ports:
      - "4566:4566"
      - "4510-4559:4510-4559"
      - "8055:8080"
    environment:
      - SERVICES=s3
      - DEBUG=1
      - DATA_DIR=/tmp/localstack/data
    volumes:
      - ./tmp/localstack:/tmp/localstack
      - /var/run/docker.sock:/var/run/docker.sock
networks:
  default:
    name: mock_demo
Listing 2. S3 bucket setup
# Start localstack service
docker-compose --file docker-compose.yml up -d

# Install `aws` tool for managing the S3 storage.
python3 -m venv env
source env/bin/activate
python3 -m pip install awscli
# Download goofys from github
xdg-open https://github.com/kahing/goofys

# Create a new profile (files in ~/.aws/)
aws configure --profile localstack
# AWS Access Key ID [None]: user
# AWS Secret Access Key [None]: key
# Default region name [None]:
# Default output format [json]: json

bucket_name="hi-bucket"
localstack_url="http://localhost:4566"

# Create and mount the bucket.
aws \
    --profile localstack --endpoint-url="$localstack_url" \
    s3api create-bucket --bucket "$bucket_name"
goofys \
    --profile localstack --endpoint "$localstack_url" \
    --http-timeout 2s -f "$bucket_name" "$DataDir" &

# Perform some workload with Sharemind HI
sharemind-hi-server ...

# Maybe delete the bucket when you are finished.
aws \
    --profile localstack --endpoint-url="$localstack_url" \
    s3 rb --force "s3://$bucket_name" >/dev/null

1. You can find the definition of the AuditLogEntry table in the file enclave_messages_common.fbs of your Sharemind HI development bundle.