shared3p_statistics_outliers.sc

shared3p_statistics_outliers.sc

Module with functions for detecting unexpected elements in a dataset.

Functions:

outlierDetectionMAD

Outlier detection (using median absolute deviation)

Detailed Description

D - shared3p protection domain

Supported types - int32 / int64 / float32 / float64

Constant 1.0 is used as the parameter for median absolute deviation.

Parameters

data

- input vector

isAvailable

- vector indicating which elements of the input vector are available

lambda

- constant. The value of lambda depends on the dataset. Anything from 3 to 5 can be used as a starting value.

returns a boolean mask vector. For each sample point x, the corresponding mask element is true if the corresponding isAvailable element is true and its absolute deviation from the median of the sample does not exceed lambda ยท MAD where MAD is the median absolute deviation of the sample.

Leaks the number of missing values in the input

Function Overloads

D bool outlierDetectionMAD(D int32[[1]] data, D bool[[1]] isAvailable, float32 lambda)

D bool outlierDetectionMAD(D int64[[1]] data, D bool[[1]] isAvailable, float64 lambda)

D bool outlierDetectionMAD(D float32[[1]] data, D bool[[1]] isAvailable, float32 lambda)

D bool outlierDetectionMAD(D float64[[1]] data, D bool[[1]] isAvailable, float64 lambda)

outlierDetectionQuantiles

Outlier detection (using quantiles)

Detailed Description

D - shared3p protection domain

Supported types - int32 / int64 / float32 / float64

Parameters

p

- quantile probability (between 0 and 1). Quantile Qp is a value such that a random variable with the same distribution as the sample points will be less than Qp with probability p.

data

- input vector

isAvailable

- vector indicating which elements of the input vector are available

returns a boolean mask vector. For each sample point x, the corresponding mask element is true if the corresponding isAvailable element is true and Qp < x < Q1-p

Leaks the number of missing values in the input

Function Overloads

D bool outlierDetectionQuantiles(float64 p, D int64[[1]] data, D bool[[1]] isAvailable)

D bool outlierDetectionQuantiles(float32 p, D int32[[1]] data, D bool[[1]] isAvailable)

D bool outlierDetectionQuantiles(float64 p, D float64[[1]] data, D bool[[1]] isAvailable)

D bool outlierDetectionQuantiles(float32 p, D float32[[1]] data, D bool[[1]] isAvailable)