shared3p_statistics_outliers.sc
shared3p_statistics_outliers.sc
Module with functions for detecting unexpected elements in a dataset.
Functions:
outlierDetectionMAD
Outlier detection (using median absolute deviation)
Detailed Description
D - shared3p protection domain |
Supported types - int32 / int64 / float32 / float64 |
Constant 1.0 is used as the parameter for median absolute deviation. |
Parameters
data |
- input vector |
isAvailable |
- vector indicating which elements of the input vector are available |
lambda |
- constant. The value of lambda depends on the dataset. Anything from 3 to 5 can be used as a starting value. |
returns a boolean mask vector. For each sample point x, the corresponding mask element is true if the corresponding isAvailable element is true and its absolute deviation from the median of the sample does not exceed lambda ยท MAD where MAD is the median absolute deviation of the sample. |
Leaks the number of missing values in the input |
Function Overloads
D bool outlierDetectionMAD(D int32[[1]] data, D bool[[1]] isAvailable, float32 lambda)
D bool outlierDetectionMAD(D int64[[1]] data, D bool[[1]] isAvailable, float64 lambda)
D bool outlierDetectionMAD(D float32[[1]] data, D bool[[1]] isAvailable, float32 lambda)
D bool outlierDetectionMAD(D float64[[1]] data, D bool[[1]] isAvailable, float64 lambda)
outlierDetectionQuantiles
Outlier detection (using quantiles)
Detailed Description
D - shared3p protection domain |
Supported types - int32 / int64 / float32 / float64 |
Parameters
p |
- quantile probability (between 0 and 1). Quantile Qp is a value such that a random variable with the same distribution as the sample points will be less than Qp with probability p. |
data |
- input vector |
isAvailable |
- vector indicating which elements of the input vector are available |
returns a boolean mask vector. For each sample point x, the corresponding mask element is true if the corresponding isAvailable element is true and Qp < x < Q1-p |
Leaks the number of missing values in the input |
Function Overloads
D bool outlierDetectionQuantiles(float64 p, D int64[[1]] data, D bool[[1]] isAvailable)
D bool outlierDetectionQuantiles(float32 p, D int32[[1]] data, D bool[[1]] isAvailable)
D bool outlierDetectionQuantiles(float64 p, D float64[[1]] data, D bool[[1]] isAvailable)
D bool outlierDetectionQuantiles(float32 p, D float32[[1]] data, D bool[[1]] isAvailable)