Rmind.Version

Description

Provides information about the version of Rmind.

accuracy

Description

Calculates accuracy score for classification model.

Arguments

actual

private vector of ground truth labels

predicted

private vector of predicted labels

acf

Description

Calculates Auto-correlation.

Usage

acf(x)

Arguments

x

a private vector

add.days

Description

Add a number of days to a date.

Arguments

date

date

offset

number of days added (can be negative)

aggregate

Description

Compute summaries of subsets of data. The result table will also contain the grouped key columns. If the ds and table arguments are not specified a temporary table is created.

Note that the “count” function does not require a data vector. Average is not supported on logical vectors. Average and sum are not supported on bitwise-shared (xor) types.

Note that the order of rows in the output is not deterministic.

Usage

aggregate(key.columns, data.columns, functions)

aggregate(key.columns, data.columns, functions, ds=NULL, table=NULL, names=NULL, overwrite=FALSE)

Arguments

by

a private vector or a list of private vectors. The groups will be formed based on these vectors. The vectors must have an integral type.

data

a list of private vectors. Summaries will be computed from these vectors.

funs

a list of strings with names of the functions that will be used to compute summaries. The possible values are “avg” (average), “count” (counts the number of values in a group), “first”, “last”, “max”, “min”, “sum”, “all” (conjunction of all values), “any” (disjunction of all values).

ds

(optional) name of the data source of the result table

table

(optional) name of the result table

names

a list of column names of the result table

overwrite

whether to overwrite table with the name given in argument table

append

Description

Appends elements to a vector or a list. Note that it mutates the sequence and returns its value, it does not return a new sequence.

Usage

append(x, value)

append(x, value, after=NULL)

Arguments

x

vector or list

value

value to be appended

after

the index after which the value is appended

attach

Description

Bring column vectors of a Sharemind database table into scope. Note that it will reassign if variables with the same names already exist.

Arguments

x

Sharemind table

auc

Description

Compute the area under the receiver operating characteristic curve.

Arguments

predictions

private float vector of classifier predictions

labels

private integral vector of true labels (0 or 1)

bonferroni

Description

Perform the Bonferroni correction for multiple testing. Divides the significance level by the number of hypotheses.

Arguments

significance

significance level

hypotheses

number of hypotheses

boxplot

Description

Plots a box and whiskers plot of a private vector. If the first argument is a list of private vectors, a plot is produced for each of them.

Usage

boxplot(x)

boxplot(x, ylim=NULL, title="", label="", tic.labels=NULL, ratio=4/3)

Arguments

x a private vector or a list of private vectors

ylim

a two-element vector specifying the minimum and maximum values of the y axis

title

plot title

label

y axis label

tic.labels

a list containing the labels of the input data sets

ratio

width/height ratio

c

Description

Attempts to construct a vector of the arguments passed to it. If the arguments are numeric then they are coerced to the same type and concatenated. If the arguments have types that can not be coerced, a list of the arguments is created.

Arguments

objects to concatenate

cast

Description

Convert a private vector to another type.

Arguments

x

a private vector

type

target type

cat

Description

Print out a concatenated string representation of values.

Usage

cat(1, 2, 3)

cat(1, 2, 3, sep=" ")

Arguments

values

sep

separator string

cbind

Description

Concatenate tables by columns. If the ds and table arguments are not given, a temporary table is created.

Usage

cbind(table1, table2)

cbind(table1, table2, result.ds=NULL, result.table=NULL, result.overwrite=FALSE)

Arguments

positional arguments are tables or lists of columns designating tables or keyword arguments of private vectors where the keyword names the new column.

result.ds

(optional) data source of the result table

result.table

(optional) name of the result table

result.overwrite

whether to overwrite table with the name given in argument “result.table”

Example

# Add column consisting of zeroes and named x
t1 <- cbind(t, x=classify(0))

ceiling

Description

For each element of a private vector, ‘ceiling’ finds the smallest integer greater than the element. The resulting vector will have an integral type.

Arguments

x

a private vector

chisq.test

Description

Perform Pearson’s chi-squared test of independence or goodness of fit test.

The x and y arguments are data vectors. Note that booleans and signed integers will be converted to an unsigned integral type so the data should not contain negative values.

If only the first input is supplied it is handled as a vector of frequencies and a goodness of fit test is performed. You can supply the expected probabilities as the p argument.

If both x and y are supplied they are handled as factors and cross tabulated. Pearson’s test of independence is performed on the contingency table. If the inputs are integers not factors then you must specify the possible levels as public vectors using the xlevels and ylevels arguments.

The test returns TRUE if the null hypothesis is rejected.

Usage

chisq.test(x, y=NULL, xlevels=NULL, ylevels=NULL, significance=0.05, p=NULL)

Arguments

data

private vector of data

filter

private filter vector

codebook

codebook list

significance

significance level

classify

Description

Create a private vector from a public vector.

Arguments

x

a public vector

type

target type. Xor-shared types are not supported.

close

Description

Close a file handle.

Usage

close(handle)

Arguments

handle

file handle

columns

Description

Returns a list with private column vectors of a Sharemind database table.

Arguments

x

Sharemind table

cov

Description

Calculates the covariance of two private vectors.

Arguments

x

private vector

y

private vector

ctree.importance

Description

Returns [information gain, important feature, its threshold] for each node.

For example, when the max.depth in ctree() is three, this function returns a list with seven elements. The number of elements (nodes) is decided as 2 ^ max.depth - 1 (binary tree). Node number is based on breadth-first search and node1 is the root.

Usage

ctree.importance(result)

Arguments

result

the result of ctree()

ctree

Description

Fits decision tree classifier.

The function returns a list with six elements. The elements are:

  • algo - this function name;

  • coefficients - a private matrix indicating variable importance. To get each node information, use ctree.importance();

  • max.depth - a number indicating the maximum depth of the tree;

  • min.label - a number indicating the minimum label, needed for prediction;

  • ml.type - a string indicating the machine learning type, needed for prediction;

  • variable.names - variable names corresponding but without dependent variable;

Usage

ctree(model)

ctree(model, max.depth=3)

Arguments

model

linear model formula. Scaling is not necessary.

max.depth

a number indicating the maximum depth of the tree

Example

# Using IRIS datasets. y.train is "species" column.
result <- ctree(
    y.train ~ X.train$sl + X.train$sw + X.train$pl + X.train$pw,
    max.depth=3
)

Supported

predict

date

Description

Convert a string date into a numeric representation. The syntax of the format string is specified in the documentation of the Haskell time package. It’s similar to standard UNIX formatting, so “man strftime” should be helpful. Throws an exception if parsing fails.

Arguments

format

format string

date

date string

Example

# Filter dates before 4th of September 2014
a[a < date("%d/%m/%Y", "04/09/2014")]

day

Description

Extract months from dates. The result is an int64 private vector.

Arguments

x

private vector containing dates

difftime

Description

Compute the difference of dates in days.

Arguments

time1

vector of dates

time2

vector of dates

dlmConfidenceInterval

Description

Returns the ceiling and floor values for confidence interval based on given z.

The function returns a named list with six elements. The elements are:

  • ceiling - a vector indicating the ceiling mean values for z

  • floor - a vector indicating the floor mean values for z

Usage

# no default values
dlmConfidenceInterval(mean, var, z)

Arguments

mean

a vector indicating the mean values

var

a vector indicating the variances

z

a number indicating the critical value

Example

# for 95% confidence interval
filtered.mod <- dlmFilter(y, dlmModPoly(order=1))
result <- dlmConfidenceInterval(filtered.mod$m, filtered.mod$C, 1.96)

# plotting example
figure <- plot(
  c(1: length(filtered.mod$m)),
  list(
    filtered.mod$m,
    result$ceiling,
    result$floor
  ),
  title="Kalman Filter: local level",
  type="l",
  ratio=4
)

dlmFilter

Description

Returns the results of Kalman Filter for a DLM which is specified by either dlmModPoly or dlmModJoint.

The function returns a named list.

When mod is the result of dlmModPoly(order=1), the elements are:
s * mod - the updated DLM. A private matrix.

  • m - public filtered values of the state vectors (local level)

  • C - public filtered values of the variances of the estimation errors

When mod is the result of dlmModPoly(order=2), the elements are:

  • mod - the updated DLM. Another named list with private vectors or matrices.

  • m0 - public filtered values of the state vectors (for local level component)

  • m1 - public filtered values of the state vectors (for local trend component)

  • C0 - public filtered values of the variances of the estimation errors (for local level component)

  • C1 - public filtered values of the variances of the estimation errors (for local trend component)

When mod is the result of dlmModJoint, the elements are:

  • mod - the updated DLM. Another named list with private vectors or matrices.

  • m0 - public filtered values of the state vectors (for local level component)

  • m1 - public filtered values of the state vectors (for local trend component)

  • m2 - public filtered values of the state vectors (for local cycle component)

  • C0 - public filtered values of the variances of the estimation errors (for local level component)

  • C1 - public filtered values of the variances of the estimation errors (for local trend component)

  • C2 - public filtered values of the variances of the estimation errors (for local cycle component)

Usage

# no default values
dlmFilter(y, mod)

Arguments

y

a private vector

mod

the result of either dlmModPoly or dlmModJoint

Supported

  • dlmLL

  • dlmForecast

  • dlmSmooth

  • dlmConfidenceInterval

Supporting

  • dlmModPoly

  • dlmModJoint

dlmForecast

Description

Returns the expected value and variance of future system states.

The function returns a named list.

When filtered mod is based on dlmModPoly(order=1), the elements are:

  • a - public smoothed values of the expected future states

  • R - public smoothed values of the variances of the expected errors

When filtered mod is based on the other DLM, the elements are:

  • a0 - public smoothed values of the expected future states (for local level component)

  • a1 - public smoothed values of the expected future states (for local trend component)

  • R0 - public smoothed values of the variances of the expected errors (for local level component)

  • R1 - public smoothed values of the variances of the expected errors (for local trend component)

Usage

# no default values
dlmForecast(mod, nAhead)

Arguments

mod a filtered DLM (the result of dlmFilter)

nAhead

the number of steps ahead for which a forecast is requested. A positive integer.

Example

filtered.mod <- dlmFilter(y, dlmModPoly(order=1))
result <- dlmForecast(filtered.mod$mod, nAhead=10)

Supported

  • dlmConfidenceInterval

Supporting

  • dlmFilter

dlmLL

Description

Returns the minus log likelihood of a filtered DLM.

The lower the value, the better.

Usage

# no default values
dlmLL(mod)

Arguments

mod

a filtered DLM (the result of dlmFilter)

Example

filtered.mod <- dlmFilter(y, dlmModPoly(order=1))
result <- dlmLL(filtered.mod$mod)

Supporting

  • dlmFilter

dlmLiuWestFilter

Description

Applying Liu-West filter with Rao-Blackwellzation for a DLM which is specified by dlmModPoly(order=1).

The argument mod needs to be the result of dlmModPoly(order=1), but dV and dW values will be ingored.

The function returns a named list. The elements are:

  • m - public filtered values of the state vectors (local level)

  • C - public filtered values of the variances of the estimation errors

  • V - estimated variance of the observation noise

  • W - estimated diagonal elements of the variance matrix of the system noise

When it is confirmed that the values V and W converge to a constant values, it is recommended to set those values to dlmModPoly() and apply dlmFilter().

Usage

# no default values
dlmLiuWestFilter(y, dlmModPoly(order=1), nParticle=100)

Arguments

y

a private vector

mod

the result of dlmModPoly(order=1)

nParticle

a number of particles

dlmModJoint

Description

Combines dlmModPoly(order=2) and dlmModSeas (or dlmModTrig). Only order=2 for dlmModPoly is supported.

Usage

dlmModJoint(dlmModPoly(order=2), dlmModSeas())

Arguments

mod1

a polynomial DLM

mod2

a seasonal or periodic DLM

Supported

  • dlmFilter

dlmModPoly

Description

Returns a Nth order polynomial DLM (Dynamic Linear Model).

The function returns a named list with six elements. The elements are:

  • order - a number (N), either 1 or 2, indicating the order of the polynomial model;

  • component - a string indicating the component of the DLM;

  • dV - a number indicating the variance of the observation noise;

  • dW - a number or a double-type vector, indicating the diagonal elements of the variance matrix of the system noise;

  • m0 - a number or a double-type vector, indicating the element(s) of the expected value of the pre-sample state vector;

  • C0 - a number or a double-type vector, indicating the element(s) of the variance matrix of the pre-sample state vector;

Usage

dlmModPoly(1)
dlmModPoly(2)

dlmModPoly(order=1, dV=1, dW=1, m0=0, C0=1e7)
dlmModPoly(order=2, dV=1, dW=c(1, 0), m0=c(0, 0), C0=c(1e7, 1e7))

Arguments

order

a number indicating the order of the polynomial model (either 1 or 2)

dV

a number indicating the variance of the observation noise

dW

a number or a double-type vector, indicating the diagonal elements of the variance matrix of the system noise

m0

a number or a double-type vector, indicating the element(s) of the expected value of the pre-sample state vector

C0

a number or a double-type vector, indicating the element(s) of the variance matrix of the pre-sample state vector

Supported

  • dlmModJoint

  • dlmFilter

dlmModSeas

Description

Returns a DLM representing a specified seasonal component.

The function returns a named list with six elements. The elements are:

  • order - a number (N) indicating the order of the seasonal model (frequency - 1);

  • component - a string indicating the component of the DLM;

  • dV - a number indicating the variance of the observation noise;

  • dW - a vector indicating the diagonal elements of the variance matrix of the system noise;

  • m0 - a vector indicating the element(s) of the expected value of the pre-sample state vector;

  • C0 - a vector indicating the element(s) of the variance matrix of the pre-sample state vector;

For filtering, use dlmModJoint before dlmFilter.

Usage

# no default values
dlmModSeas(4)
dlmModSeas(frequency=4, dV=1, dW=c(1, 0, 0), m0=c(0, 0, 0), C0=c(1e7, 1e7, 1e7))

Arguments

frequency a number indicating the number of season

dV

a number indicating the variance of the observation noise

dW

a vector indicating the diagonal elements of the variance matrix of the system noise

m0

a vector indicating the element(s) of the expected value of the pre-sample state vector

C0

a vector indicating the element(s) of the variance matrix of the pre-sample state vector

Supported

  • dlmModJoint

dlmModTrig

Description

Returns a DLM representing a specified periodic component.

The function returns a named list with eight elements. The elements are:

  • order - a number (N) indicating the order of the periodic model (2

  • q - 1);

  • component - a string indicating the component of the DLM;

  • s - a number indicating the period;

  • q - a number indicating the number of harmonics in the DLM;

  • dV - a number indicating the variance of the observation noise;

  • dW - a number indicating the diagonal elements of the variance matrix of the system noise;

  • m0 - a vector indicating the element(s) of the expected value of the pre-sample state vector;

  • C0 - a vector indicating the element(s) of the variance matrix of the pre-sample state vector;

For filtering, use dlmModJoint before dlmFilter.

Usage

# no default values
dlmModTrig(4, 2)
dlmModTrig(s=4, q=2, dV=1, dW=0, m0=c(0, 0, 0), C0=c(1e7, 1e7, 1e7))

Arguments

s a number indicating the period

q

a number indicating the number of harmonics in the DLM. Must be an even number and not exceed s/2.

dV

a number indicating the variance of the observation noise

dW

a number indicating the diagonal elements of the variance matrix of the system noise

m0

a vector indicating the element(s) of the expected value of the pre-sample state vector

C0

a vector indicating the element(s) of the variance matrix of the pre-sample state vector

Supported

  • dlmModJoint

dlmSmooth

Description

Returns the results of Kalman Smoother using a filtered DLM.

The function returns a named list.

When filtered mod is based on dlmModPoly(order=1), the elements are:

  • s - public smoothed values of the state vectors

  • S - public smoothed values of the variances of the smoothing errors

When filtered mod is based on the other DLM, the elements are:

  • s0 - public smoothed values of the state vectors (for local level component)

  • s1 - public smoothed values of the state vectors (for local trend component)

  • S0 - public smoothed values of the variances of the smoothing errors (for local level component)

  • S1 - public smoothed values of the variances of the smoothing errors (for local trend component)

Usage

# no default values
dlmSmooth(mod)

Arguments

mod

a filtered DLM (the result of dlmFilter)

Example

filtered.mod <- dlmFilter(y, dlmModPoly(order=1))
result <- dlmSmooth(filtered.mod$mod)

Supported

  • dlmConfidenceInterval

Supporting

  • dlmFilter

do.call

Description

Calls a function on a list of arguments.

Arguments

fun

function

args

list of arguments

double

Description

Creates a vector of floating point numbers.

Arguments

length

length of the vector

element.of

Description

Check if each element of a private vector is an element of a set.

Arguments

x

private vector

set

public vector representing the set

erf

Description

Compute the error function of a numeric vector.

Arguments

x

public or private vector

exp

Description

Compute the exponential function of a numeric vector.

Arguments

x

public or private vector

f1

Description

Calculates f1 score for classification model.

Usage

f1(actual, predicted)

f1(actual, predicted, average="macro")

Arguments

actual private vector of ground truth labels

predicted

private vector of predicted labels

average

a string. If binary, you need not to specify. If multiple-classes, choose one (either “macro”, “micro”, or “weighted”).

factor

Description

Create a private factor vector from a list of public strings.

The mapping argument specifies how the string levels are mapped to integer codes. This is an optimisation used to speed up computations with factors in Sharemind. An example of the mapping argument would be list(a=1, b=2) when there are two levels (“a” and “b”). The codes must be sequential and start from 1.

You can get the factor mapping of an existing factor using levels. This allows you to create a new factor vector with the same mapping as an existing one like this:

factor(list("a"), levels(x))

Usage

factor(x)

factor(x, list(a=1, b=2))

Arguments

x

list of strings

mapping

either NULL or list of level=code pairs

fft

Description

Calculates the Discrete Fourier Transform (DFT) with the Fast Fourier Transform (FFT) algorithm.

This function returns a named list with two elements: real and imag. Both are vectors.

Warning: only a vector with “a power of 2” length is supported. If the input vector length is not a power of 2, this function slices at the index “the largest power of 2” (ex., if the length is 100, the function slices the vector at index 64).

Usage

fft(z)

Arguments

z

a private vector (real number only)

file

Description

Opens a file handle which can be used to read the file or write to the file.

Usage

file(path)

file(path, mode="rw")

Arguments

path a string containing the path of the file

mode

a string indicating the IO mode. ‘rw’ is reading and writing, ‘r’ is reading, ‘w’ is writing, ‘a’ is append.

fisher.test

Description

Perform Fisher’s exact test to test for association between two binary variables. The numeric values 0 and 1 are always taken as the levels so make sure to convert your inputs to this format. This is performed implicitly when one of the inputs is a factor with two levels. Returns TRUE if the null hypothesis is rejected and FALSE otherwise.

Usage

fisher.test(a, b)

fisher.test(a, b, significance=0.05)

Arguments

a

first input

b

second input

significance

significance level

floor

Description

For each element of a private vector, floor finds the largest integer less than the element. The resulting vector will have an integral type.

Arguments

x

a private vector

flush

Description

Send unfinished writes to a file handle to the operating system.

Usage

flush(handle)

Arguments

handle

file handle

freq

Description

Calculates the frequency table of a private vector.

If the pretty argument is TRUE the frequency table is printed to the output. Otherwise a list of the counts or frequencies is returned which are named according to the factor levels.

Usage

freq(x)

freq(x, freq=TRUE, pretty=FALSE)

Arguments

x

private vector

freq

whether to output absolute counts (TRUE) or percentages

pretty

pretty print the frequency table

freqplot

Description

Plots a histogram of a private factor vector or public a list of bin counts. Unlike hist which calculates frequencies for ranges, freqplot counts the frequency of each distinct value so it’s more useful for discrete data.

Usage

freqplot(x)

freqplot(x, title=NULL, xlab=NULL, ylab=NULL, xtics=NULL, freq=TRUE, ratio=4/3)

Arguments

x

private factor vector or list of counts

title

plot title

xlab

x axis label

ylab

y axis label

xtics

list of string labels for x axis tics

freq

whether to output absolute counts (TRUE) or percentages

ratio

width/height ratio

glm

Description

Fits generalized linear models. Only two types of models are currently supported: regular linear regression (if family is “gaussian”) and logistic regression (if family is “binomial-logit”). The documentation of lm describes the different methods of solving systems of linear equations. The function returns a list with six elements. The elements are:

  • algo - this function name;

  • coefficients - fitted coefficients (the first element is the intercept);

  • family - a string indicating the model type;

  • ml.type - a string indicating the machine learning type, needed for prediction;

  • model - list of data vectors (the first one is the dependent variable vector);

  • variable.names - variable names corresponding to model but without dependent variable;

Usage

glm(model)

glm(model, family="gaussian", iterations=10, sole.method=NULL, sole.iterations=10)

Arguments

model linear model formula

family

a string indicating the distribution of the dependent variable and the type of link function (either “gaussian” or “binomial-logit”)

iterations

number of iterations of the algorithm

sole.method

a string indicating the method used for solving systems of linear equations (either “gauss”, “lu”, “conjugate-gradient” or “inversion”)

sole.iterations

if sole.method is “conjugate-gradient”, the number of iterations to run the conjugate gradient algorithm

Supported

predict

glmnet

Description

Fits generalized linear models with Dai-Liao non-linear conjugate gradient optimization with the strong wolfe conditions. Supported families are “gaussian”, “binomial-logit”, “gamma”, and “poisson”. When family="binomial-logit", the dependent must be either 0 or 1.

The function returns a list with six elements. The elements are:

  • algo - this function name;

  • coefficients - fitted coefficients (the first element is the intercept);

  • family - a string indicating the model type;

  • ml.type - a string indicating the machine learning type, needed for prediction;

  • model - list of data vectors (the first one is the dependent variable vector);

  • variable.names - variable names corresponding to model but without dependent variable;

Usage

glmnet(model)

glmnet(model, family="gaussian", iterations=10, wolfe.iterations=5,
          alpha=1, lambda.lasso=0.1, lambda.ridge=0.1)

Arguments

model

linear model formula. Applying scaled variables is highly recommended.

family

a string indicating the distribution of the dependent variable and the type of link function (either “gaussian”, “binomial-logit”, “gamma”, or “poisson” )

iterations

number of iterations of the non-linear CG optimization. Start with a small number and increase gradually as needed.

wolfe.iterations

number of iterations of wolfe-conditions. Start with a small number and increase gradually as needed.

alpha

the elastic net mizing parameter. alpha=1 is the lasso (L1) penalty, and alpha=0 the ridge (L2) penatly. alpha must be in the range of [0, 1].

lambda.lasso

the lasso regularization parameter

lambda.ridge

the ridge regularization parameter

Example

# Using IRIS datasets. y.train is "pw" column.
result <- glmnet(
    y.train ~ X.train$sl + X.train$sw + X.train$pl,
    family="gaussian",
    iterations=20,
    wolfe.iterations=10,
    alpha=0.5,
    lambda.lasso=0.2,
    lambda.ridge=0.3)

Supported

predict

head

Description

Take the first n elements of a vector.

Usage

head(x, n)

head(x, n=6)

Arguments

x a private vector

n

the number of elements to take. If n is negative, takes every element of the list except the last abs(n).

heatmap

Description

Plots a heatmap from private data. This is a replacement for X-Y plots in ordinary statistics software. The function takes two private vectors consisting of the x and y coordinates respectively. Instead of drawing each point separately, the number of points falling in an area is counted and a shade is assigned to the area.

Usage

heatmap(x, y)

heatmap(x, y, xlab=NULL, ylab=NULL, regression=FALSE, ratio=4/3)

Arguments

x private vector with x coordinates

y

private vector with y coordinates

xlab

x axis label

ylab

y axis label

regression

a boolean indicating if a linear regression line should also be plotted

ratio

width/height ratio

hist

Description

Plots a histogram of a private vector.

Usage

hist(x)

hist(x, title=NULL, xlab=NULL, ylab=NULL, ylim=NULL, freq=TRUE, ratio=4/3)

Arguments

x private vector

title

plot title

xlab

x axis label

ylab

y axis label

ylim

a two-element vector specifying the minimum and maximum values of the y axis

freq

a boolean indicating if the histogram should consist of frequencies (TRUE) or percentages

ratio

width/height ratio

hoslem.test

Description

Hosmer-Lemeshow goodness of fit test for binary classifiers.

Usage

hoslem.test(observations, predictions)

hoslem.test(observations, predictions, significance=0.05, groups=10)

Arguments

observations

vector of observed labels (0 or 1)

predictions

vector of classifier predictions (between 0 and 1)

significance

(optional) significance level

g

(optional) number of groups used

import

Description

Read and evaluate a file containing Rmind code. Unlike the source procedure, the path passed to import is relative to the path of the source code file containing the call to import.

Arguments

filename

relative path of the source code file

integer

Description

Creates a vector of integers.

Arguments

length

length of the vector

is.available

Description

Given a private vector, returns a private vector indicating which values are not missing.

Arguments

x

a private vector

is.binary

Description

Returns a public boolean stating whether all values in a given private vector are binary (zeroes and ones)

Arguments

x

a private vector

is.null

Description

Checks if a value is NULL.

Usage

is.null(x)

Arguments

x

a value of any type

lapply

Description

Apply a function to each element of a list or a public vector. If the input is a vector it’s converted into a list. Use sapply if you want the vector to remain a vector. Since the function exists on the client side, it can only be applied to public inputs.

Arguments

x

list or public vector

fun

function with one argument

Example

# Add one to each element of a vector
lapply(vec, function(x) x + 1)

length

Description

Returns the length of a vector or a list.

Arguments

x

a vector or a list

levels

Description

Get the list of levels of a private factor vector. The list elements are integer codes of the levels and are labeled with the string representation of the level.

Arguments

x

private factor vector

list

Description

Constructs a list of the arguments passed to it. List elements can be named by giving the name and value pairs as keyword arguments, e.g. name = value.

Arguments

objects that will form the list

lm

Description

Perform linear regression. The methods used for solving systems of linear equations are “lu” (LU decomposition), “gauss” (Gaussian elimination), “conjugate-gradient” (Conjugate gradient method), “inversion” (matrix inversion). The conjugate gradient method is fast and accurate but accuracy depends on the number of iterations. Matrix inversion is fast but is only implemented for less than four variables. If the method argument is NULL, either matrix inversion or LU decomposition is chosen depending on the number of variables. The function returns a vector with the estimated coefficients. The first element is the intercept.

Usage

lm(model)

lm(model, method=NULL, iterations=10)

Arguments

model model formula

method

a string indicating the method used for solving systems of linear equations (either “gauss”, “lu”, “conjugate-gradient” or “inversion”)

iterations

if the method is “conjugate-gradient”, the number of iterations to run the conjugate gradient algorithm

Example

# Specifying a model with two explanatory variables
lm(y ~ x1 + x2)

ln

Description

Compute the natural logarithm of a numeric vector.

Arguments

x

public or private vector

load

Description

Loads a Sharemind table.

Arguments

ds

name of the data source

table

name of the table

loess

Description

Perform LOESS regression with one explanatory variable and linear local regression.

Usage

loess(x, y)

loess(x, y, span=0.75, points=10)

Arguments

x

independent variable sample

y

dependent variable sample

span

fraction of points to use for local regression (optional)

points

number of local regressions (optional)

log

Description

Compute the logarithm of a numeric vector.

Usage

log(x, base)

log(x, base=exp(1))

Arguments

x

public or private vector

base

logarithm base. Public or private vector.

log10

Description

Compute the base 10 logarithm of a numeric vector.

Arguments

x

public or private vector

logical

Description

Creates a vector of booleans.

Arguments

length

length of the vector

ls

Description

List variables in the current environment.

ls.tables

Description

List database tables in a data source.

Arguments

ds

data source

mad

Description

Calculates the median absolute deviation of a private vector (median(abs(x - median(x))) * constant).

Usage

mad(x)

mad(x, constant=1.4826)

Arguments

x

private vector

constant

scale factor

mann.whitney.test

Description

Perform the Mann-Whitney U test.

Usage

mann.whitney.test(x, y)

mann.whitney.test(x, y, alternative="two.sided", significance=0.05, correct=TRUE)

Arguments

x private vector of the first sample

y

private vector of the second sample

alternative

a string indicating the type of the alternative hypothesis (either “two.sided”, “less” or “greater”)

significance

significance level

correct

a boolean indicating if tied ranks should be replaced with their average. This will be slower but less conservative.

matrix

Description

Constructs a private matrix.

Arguments

private column vectors

max

Description

Returns the largest element of a private vector.

Arguments

x

private vector

mean

Description

Calculates the arithmetic mean of a private vector.

Arguments

x

private vector

median

Description

Calculates the median of a private vector.

Arguments

x

private vector

merge

Description

Merge two Sharemind database tables. The original tables remain unchanged. If by, by.x and by.y are unspecified the procedure attempts to find a column with a common name. If multiple such pairs exist, use the by argument. If the key column is named differently in each table, use by.x and by.y. To perform a cross join give NULL as the value of by. Use a list of strings to specify a multi-column key.

Note that the order of rows in the output is not deterministic.

Usage

merge(x, y)

merge(x, y, by="", by.x=NULL, by.y=NULL, all=FALSE, all.x=FALSE, all.y=FALSE)

Arguments

x first Sharemind table

y

second Sharemind table

by

name of the key column

by.x

name of the key column of the first table

by.y

name of the key column of the second table

all

all rows of both columns should be in the result table (full outer join)

all.x

all rows of the left column should be in the result table (left outer join)

all.y

all rows of the right column should be in the result table (right outer join)

min

Description

Returns the smallest element of a private vector.

Arguments

x

private vector

month

Description

Extract months from dates. The result is an int64 private vector.

Arguments

x

private vector containing dates

mse

Description

Calculates mse (mean squared error) score for regression model.

Usage

mse(actual, predicted)

Arguments

actual

private vector of ground truth values

predicted

private vector of predicted values

multiple.chisq.test

Description

Perform simultaneous chi-squared tests. Note that the data will be converted to an unsigned integral type so it should not contain negative values. The codebook is a list of two vectors. The first contains the values expected in the input and the second contains values that they are assigned to. This can be used to turn multiple categories into one. If you do not wish to change categories, pass list(a:b, a:b) as the codebook argument, where a is the minimum and b the maximum expected value. The Benjamini-Hochberg correction is less conservative but slower. Returns a list with indices of the data sets for which the null hypothesis was rejected.

Usage

multiple.chisq.test(dataList, filter, codebook)

multiple.chisq.test(dataList, filter, codebook, significance=0.05, method="benjamini-hochberg")

Arguments

x list of private vectors. The vectors can not contain missing values.

filter

a private filter vector. Should be a boolean vector or consist of zeroes and ones. This is used to divide each private vector into two groups.

codebook

codebook list

significance

significance level

method

correction method (either “bonferroni” or “benjamini-hochberg”)

multiple.t.test

Description

Perform simultaneous t-tests. The tests are two-sided (the alternative hypothesis is that the means of the groups are different). The variance of the two groups are assumed to be equal. There’s no paired version. The Benjamini-Hochberg correction is less conservative but slower. Returns a list with indices of the data sets for which the null hypothesis was rejected.

Usage

multiple.t.test(dataList, filter)

multiple.t.test(dataList, filter, significance=0.05, method="benjamini-hochberg")

Arguments

x list of private vectors. The vectors can not contain missing values.

filter

a private filter vector. Should be a boolean vector or consist of zeroes and ones. This is used to divide each private vector into two groups.

significance

significance level

method

correction method (either “bonferroni” or “benjamini-hochberg”)

multiplot

Description

Combine multiple plots. The layout can be controlled with the rows and cols arguments. Either one can be specified and the other will be calculated. If neither is specified, the plots are stacked vertically.

Usage

multiplot(plotList)

multiplot(plotList, rows=NULL, cols=NULL, ratio=NULL)

Arguments

x

list of plots

rows

number of rows in layout

cols

number of columns in layout

ratio

width/height ratio

names

Description

Returns a list with the names of columns of a Sharemind database table or names of list elements.

Arguments

x

Sharemind table or list

ncol

Description

Returns the number of columns in a Sharemind database table.

Arguments

x

Sharemind table

nrow

Description

Returns the number of rows in a Sharemind database table.

Arguments

x

Sharemind table

plot

Description

Produce an X-Y plot of public values.

The legend position consists of four components - vertical position (top, bottom or center), horizontal position (left, right or center), position in relation to the frame (inside or outside) and position of elements inside the legend (vertical or horizontal).

Usage

plot(x, y)

plot(x, y, title="", xlab="", ylab="", ylim=NULL, type=NULL, label.points=FALSE, data.labels=NULL, legend.pos="top right inside vertical", ratio=4/3)

Arguments

x x axis coordinate vector (or list of vectors)

y

y axis coordinate vector (or list of vectors)

title

plot title

xlab

x axis label

ylab

y axis label

ylim

a two-element vector specifying the minimum and maximum elements of the y axis

type

a string indicating the type of the plot (either “p” for points, “l” for lines or “b” for both)

label.points

a boolean indicating if points should have labels

data.labels

list of data set names

legend.pos

legend position

ratio

width/height ratio

Examples

# A plot with a single data set
plot(1:10, 1:10)

# A plot with two data sets
plot(list(x1, x2), list(y1, y2))

prcomp

Description

Perform principal component analysis. A list with three values is returned:

  • loads is a list of vectors where the nth vector consists of the coordinates of the nth principal component;

  • scores is a private matrix consisting of the data transformed to the principal component space;

  • residuals is the residual matrix (difference between the data matrix and the matrix reconstructed from the principal components).

Usage

prcomp(matrix)

prcomp(matrix, components=1, iterations=10)

Arguments

x

private data matrix (each columns is a variable)

components

(optional) the number of components to find

iterations

(optional) the number of iterations to run the algorithm

precision

Description

Calculates precision score for classification model.

Usage

precision(actual, predicted)

precision(actual, predicted, average="macro")

Arguments

actual private vector of ground truth labels

predicted

private vector of predicted labels

average

a string. If binary, you need not to specify. If multiple-classes, choose one (either “macro”, “micro”, or “weighted”).

predict

Description

Predict values based on the result of a modeling function such as glm

Usage

predict(result, data)

Arguments

result the result of a modeling function

data

a Sharemind table with the same structure with the data used in a modeling function, specified in the model argument

Example

# Using IRIS datasets. y.train is "pw" column.
result <- glmnet(
    y.train ~ X.train$sl + X.train$sw + X.train$pl
)

prediction <- predict(result, X.test)

Supporting

  • glm

  • glmnet

  • === ctree

    === title: print

print

Description

Prints and returns its argument.

Arguments

x

an object whose value will be printed

q

Description

Quits the application.

Arguments

status

status code. Note that in the case of code 0, the success code of the platform is actually returned (which is usually 0).

quantile

Description

Compute sample quantiles.

Usage

quantile(x)

quantile(x, probs=c(0, 0.25, 0.5, 0.75, 1))

Arguments

x

sample vector

probs

vector of quantile probabilities (optional)

randomForest

Description

Fits random forest classifier.

The function returns a list with seven elements. The elements are:

  • algo - this function name;

  • coefficients - a private matrix indicating variable importance. To get each node information, use randomForest.importance();

  • max.depth - a number indicating the maximum depth of the tree;

  • min.label - a number indicating the minimum label, needed for prediction;

  • ml.type - a string indicating the machine learning type, needed for prediction;

  • ntree - a number indicating the number of trees;

  • variable.names - variable names corresponding but without dependent variable;

Usage

randomForest(model)

randomForest(model, max.depth=3, ntree=5)

Arguments

model

linear model formula. Scaling is not necessary.

max.depth

a number indicating the maximum depth of each tree

ntree

a number indicating the number of the trees

Example

# Using IRIS datasets. y.train is "species" column.
result <- randomForest(
    y.train ~ X.train$sl + X.train$sw + X.train$pl + X.train$pw,
    max.depth=3,
    ntree=5
)

Supported

predict

rbind

Description

Concatenate tables by row. If the ds and table arguments are not given, a temporary table is created.

Usage

rbind(table1, table2)

rbind(table1, table2, result.ds=NULL, result.table=NULL, result.overwrite=FALSE, result.names=NULL)

Arguments

positional arguments are tables or lists of columns designating tables

result.ds

(optional) data source of the result table

result.table

(optional) name of the result table

result.overwrite

whether to overwrite table with the name given in argument result.table

result.names

a list of column names of the result table

recall

Description

Calculates recall score for classification model.

Usage

recall(actual, predicted)

recall(actual, predicted, average="macro")

Arguments

actual private vector of ground truth labels

predicted

private vector of predicted labels

average

a string. If binary, you need not to specify. If multiple-classes, choose one (either “macro”, “micro”, or “weighted”).

recode

Description

Replace level to code mapping of a factor.

Arguments

x

private vector

mapping

list of level=code pairs

rep

Description

Return a list or a vector with multiple copies of a value. If the input is a vector, a vector with count copies is returned. If the input is not a vector, a list with count copies is returned.

Arguments

x

value

count

number of copies

rm

Description

Remove variables from the current environment.

Usage

rm(x)

rm(x, list=NULL)

Arguments

variable names given as variables or strings

list

(optional) list of variable names as strings

Examples

# Remove all user-defined variables
rm(list=ls())

rm.missing

Description

Remove missing values.

Usage

rm.missing(vector, value)

rm.missing(vector, value=NULL)

Arguments

x a private vector or matrix

value

an argument specifying the value that missing values are replaced by. It can be a scalar or a list of scalars if the input is a matrix in which case the list elements give replacement values for each matrix column. If no value is specified, missing values will be replaced by 0 or FALSE.

rm.outliers

Description

Remove outliers of a private vector. The length of the vector will remain the same. Outliers will be marked as not available. There are two methods - “quantiles” and “mad”. If the constant argument is x, “quantiles” will remove values below the (x * 100%) quantile and over the ((1 - x) * 100%) quantile. The default constant is 0.05. If the method is “mad”, the median absolute deviation method of outlier detection is used and the constant will be the lambda of the MAD formula. The default value is 3.

Usage

rm.outliers(x)

rm.outliers(x, method="quantiles", constant=NULL)

Arguments

x private vector

method

a string indicating the type of method used (either “quantiles” or “mad”)

constant

numeric constant used by the method

rm.table

Description

Remove a database table.

Usage

rm.table(ds, table)

Arguments

ds

data source name

table

table name

rmse

Description

Calculates rmse (root mean squared error) score for regression model.

Usage

rmse(actual, predicted)

Arguments

actual

private vector of ground truth values

predicted

private vector of predicted values

roc

Description

Plot the receiver operating characteristic curve of a binary classifier.

Note that the current implementation leaks the true positive rate and false positive rate computed for each observation. This procedure can be disabled on the server side.

Arguments

predictions

private float vector of classifier predictions

labels

private integral vector of true labels (0 or 1)

round

Description

Rounds all elements of a private vector. The resulting vector will have an integral type.

Arguments

x

a private vector

row.fold

Description

Reduces each row of a matrix to a scalar value. A reduction of x1, x2, …, xn with the associative and commutative operator * is the value x1 * x2 * … * xn.

Arguments

x

private matrix

operator

operator used for reducing ('`

sapply

Description

Apply a function to each element of a list or a public vector. If the input is a vector or a list where all elements are vectors, the result will be a vector, otherwise it will be a list. Since the function exists on the client side, it can only be applied to public inputs.

Arguments

x

list or public vector

fun

a function with one argument

save

Description

Save a plot to a file. If the width or height is specified but not both of them, the aspect ratio of the plot is used to calculate the other one. The format of the image is inferred from the path. The supported formats are PNG and SVG.

Arguments

x

plot

path

file path

width

width of the plot

height

height of the plot

scale

Description

Returns the centered (mean 0) and scaled (standard deviation 1) Sharemind table or matrix.

Usage

scale(x, center=TRUE, scale=TRUE)

Arguments

x Sharemind table or private data matrix

center

a boolean. If TRUE, mean of each column of the returned value will be around 0

scale

a boolean. If TRUE, standard deviation of each column of the returned value will be around 1

scale.minmax

Description

Transform Sharemind table or matrix by scaling each feature to a given range.

Usage

scale.minmax(x, min=0, max=1)

Arguments

x

Sharemind table or private data matrix

min

a number indicating the minimum range for transformation

max

a number indicating the maximum range for transformation

screeplot

Description

Plot the variance (or cumulative proportion of total variance) of each principal component computed by prcomp.

Usage

screeplot(x)

screeplot(x, npcs=NULL, cumulative=FALSE)

Arguments

x object returned by prcomp

npcs

(optional) number of principal components to plot

cumulative

(optional) a boolean indicating whether to plot cumulative proportion of total variance

sd

Description

Calculates the standard deviation of a private vector.

Arguments

x

private vector

shift.left

Description

Left-shift elements of each row of a table. When an element of the shift vector is missing, the whole row is marked missing in the result.

Arguments

shift

a private integral vector containing the shift amount for each row

data

a matrix representing the table to be shifted

left.padding

indicates how many columns of missing values to add to the beginning of the matrix

right.padding

indicates how many columns of missing values to add to the end of the matrix

show

Description

Show a plot in a window. If the width or height is specified but not both of them, the aspect ratio of the plot is used to calculate the other one. In the REPL, just evaluating a plot will also show it.

Usage

show(plot)

show(plot, width=NULL, height=NULL)

Arguments

x

plot

width

width

height

height

sin

Description

Compute the sine function of a numeric vector.

Arguments

x

public or private vector

sort

Description

Sorts a Sharemind database table. A new table is created and the original data remains unchanged.

Arguments

x Sharemind table

by

a string or a list of strings. The table will be sorted by those columns. Sorting is performed backwards so that the first column to sort by will be ordered in the resulting database. The next column will be ordered where first column elements are equal and so on.

dir

a string or a list of strings which specifies the sorting direction for each column in the by argument. The possible values are "ascending" and "descending".

na

a string or a list of strings which specifies the order of missing values. The possible values are "first" (missing values come before other values) and "last" (missing values come after other values).

source

Description

Read and evaluate a file containing Rmind code.

Arguments

filename

path of the source code file

sqrt

Description

Compute the square root of a numeric vector.

Arguments

x

public or private vector

store.table

Description

Stores private vectors into a permanent database table.

Usage

store.table(dsName, tableName, columnNamesList, columnsList)

store.table(dsName, tableName, columnNamesList, columnsList, overwrite=FALSE)

Arguments

data.source data source name

table.name

table name

column.names

list of column names

columns

list of private vectors

overwrite

a boolean indicating whether to overwrite a table if a table exists in the same data source with the same name

subset

Description

Returns a filtered subset of a Sharemind database table.

Note that the order of rows in the output is not deterministic.

Usage

subset(x, filter)

subset(x, filter, column=NULL)

Arguments

table Sharemind table

filter

filter vector

column

if this argument is supplied, only the column vector with this name is filtered and returned

Examples

# Assume variable x is a table with columns a and b.
# Returns a vector containing elements of column b
# where elements of column a are positive and elements of
# b are not 42.
subset(x, a > 0 & b != 42, b)

sum

Description

Calculates the sum of a vector.

Arguments

x

private or public vector

summary.glm

Description

Compute the Akaike information criterion, standard errors and p-values of coefficients of generalized linear models. See the documentation for the glm procedure. A list with three values is returned:

  • std.errors is a vector consisting of the standard errors of the coefficients;

  • p.values is a vector consisting of the Wald test p-values of the coefficients;

  • aic is the Akaike information criterion.

Arguments

model

the value returned by the glm procedure

summary

Description

Calculates the minimum, lower quantile, median, mean, upper quantile and maximum of a private vector (in that order).

Arguments

x

private vector

summary.prcomp

Description

Display the standard deviation, proportion of total variance and cumulative proportion of principal components computed by prcomp.

Arguments

x

object returned by prcomp

t.test

Description

Performs Student’s or Welch’s t-test. If var.equal is FALSE, the Welch-Satterthwaite equation is used to find the degrees of freedom. Returns TRUE if the null hypothesis is rejected and FALSE otherwise.

Usage

t.test(x, y)

t.test(x, y, paired=FALSE, var.equal=TRUE, significance=0.05, alternative="two.sided", mu=0)

Arguments

x private vector of the first sample

y

private vector of the second sample

paired

a boolean indicating if a paired t-test should be performed

var.equal

a boolean indicating if the variances of the samples are expected to be equal

significance

significance level

alternative

a string indicating the type of the alternative hypothesis (either “two.sided”, “less” or “greater”)

mu

a number indicating the difference of means

tail

Description

Take the last n elements of a vector.

Usage

tail(x, n)

tail(x, n=6)

Arguments

x a private vector

n

the number of elements to take. If n is negative, takes every element of the list except the first abs(n).

toString

Description

Returns a string representation of a value.

Arguments

x

any value

train.test.split

Description

Splits datasets into train and test.

This functions returns a named list with two new Shremind tables (train and test).

Arguments

data Sharemind table

train.size

ratio for train data size.

random.state

a number for random state. If Null, random.state also will be random.

tryCatch

Description

Evaluate an expression and handle errors. The function passed as the error parameter will be executed when expr throws an exception. The error message will be passed to the error function. There is also a finally argument which will be evaluated after evaluation and error handling.

Usage

tryCatch(expr, error.handler, finally.expr)

Arguments

expr

expression to evaluate

error

error handling function

finally

expression to evaluate after error handling

typeof

Description

Returns a string representation of the type of the first argument.

Arguments

x

any object

unique.id

Description

Computes a public unique ID from columns of input table. Does not modify the input table but returns a new table that contains the original columns and extra UID column. The rows of the resulting table are shuffled.

Unique ID column can be later used to sort and aggregate.

Usage

unique.id(table, "column")

unique.id(table, "column", name="UID")

Arguments

x Sharemind table

by

a string or a list of strings denoting columns of the input table. Input columns of will be combined row-wise to form the UID. If two UID rows are equal this means that (with extremely high probability) the rows of the original data are also equal. If two UID rows are not equal this means that the original data rows were not.

name

(optional) name of the added UID column. The default name is “UID”.

Example

# Create a unique ID column which can be used for fast aggregation
t <- unique.id(t, list("name", "year"))
t <- aggregate(t$UID, list(t$wage), list("avg"))

unique

Description

Remove duplicate elements from a private vector.

Arguments

x

private vector

var

Description

Calculates the variance of a private vector.

Arguments

x

private vector

wilcoxon.test

Description

Perform Wilcoxon rank sum or Wilcoxon signed rank tests. If paired is TRUE then the Wilcoxon signed rank test is performed, otherwise the Wilcoxon rank sum test is performed. Returns TRUE if the null hypothesis is rejected and FALSE otherwise.

Usage

wilcoxon.test(x, y)

wilcoxon.test(x, y, paired=FALSE, alternative="two.sided", significance=0.05, correct=TRUE)

Arguments

x private vector of the first sample

y

private vector of the second sample

paired

a boolean indicating if the samples are paired or not

alternative

a string indicating the type of the alternative hypothesis (either “two.sided”, “less” or “greater”)

significance

significance level

correct

a boolean indicating if tied ranks should be replaced with their average. This will be slower but less conservative.

write

Description

Write a string to a file.

Usage

write(string, handle)

Arguments

x

string

handle

file handle

xgboost

Description

Fits XGBoost.

The function returns a list with 14 elements. The elements are:

  • algo - this function name;

  • coefficients - a private matrix indicating variable importance;

  • eta - a number indicating the used step size shrinkage;

  • gamma - a number indicating the used minimum loss reduction;

  • lambda - a number indicating the used L2 regularization penalty;

  • max.depth - a number indicating the used maximum depth of the tree;

  • min.child.weight - a number indicating the used minimum sum of instance weight needed in a child node;

  • ml.type - a string indicating the machine learning type, needed for prediction;

  • nrounds - a number indicating the fitted number of trees;

  • nsplits - a number indicating the used number of bins for the apporoximate greedy algorithm (nsplits corresponds to 1/sketch_eps);

  • objective - a string indicating the used objective for fitting;

  • split.is.global - a boolean indicating the apporoximate greedy algorithm is done whether globally or not(locally);

  • subsampling - a number indicating the used subsampling ratio of the fitting instances;

  • variable.names - variable names corresponding but without dependent variable;

Usage

xgboost(model)

params <- list(
  eta=0.3,
  gamma=1,
  lambda=1,
  objective="regression",
  max.depth=3,
  min.child.weight=1,
  subsampling=0.7,
  nsplit=20,
  split.is.global=TRUE
)
xgboost(model, params=params, nrounds=3)

Arguments

model linear model formula. Scaling is not necessary.

params

a named list of parameters. Each element is based on its defenition written in https://xgboost.readthedocs.io/en/latest/parameter.html. If this list contains some of the above elements, the rest of the elements are defined with the default values. See the Value Restrictions below.

nrounds

a number indicating the number of the trees

Example

# Using IRIS datasets. y.train is "species" column.
result <- xgboost(
    y.train ~ X.train$sl + X.train$sw + X.train$pl + X.train$pw,
    params=list(objective="multi")
)

Supported

predict

Value Restrictions

  • eta - (0, 1];

  • gamma - [0,inf];

  • lambda - [0,inf];

  • objective - either “regression”, “binary”, “multi”. If “binary”, the labels must be either 0 or 1;

  • max.depth - (0, inf];

  • min.child.weight - (0, 1];

  • subsampling - (0, 1];

  • nsplit - (0, inf];

year

Description

Extract years from dates. The result is an int64 private vector.

Arguments

x

private vector containing dates