shared3p_statistics_testing.sc

shared3p_statistics_testing.sc

Module with statistical hypothesis tests.

Functions:

chiSquared

Perform Pearson’s chi-squared test of independence.

Detailed Description

D - any protection domain

Supported types - int32 / int64

This version does not do continuity correction so the R equivalent is chisq.test(contingencyTable, correct=FALSE)

Parameters

contTable

- contingency table in the format [%noheader]

| | Cases | Controls | Option 1 | c1 | d1 | Option 2 | c2 | d2 | Option 3 | c3 | d3 | … | … | …

returns the test statistic

None

Function Overloads

D float32 chiSquared(D uint32[[2]] contTable)

D float64 chiSquared(D uint64[[2]] contTable)

chiSquared(goodness of fit with probabilities)

Perform Pearson’s chi-squared goodness of fit test.

Detailed Description

D - any protection domain

Supported types - uint32 / uint64

Parameters

observed

- observed frequency of each class

p

- theoretical probability of each class

returns chi-squared test statistic

Function Overloads

D float32 chiSquared(D uint32[[1]] observed, D float32[[1]] p)

D float64 chiSquared(D uint64[[1]] observed, D float64[[1]] p)

chiSquared(goodness of fit)

Perform Pearson’s chi-squared goodness of fit test.

Detailed Description

D - any protection domain

Supported types - uint32 / uint64

Parameters

observed

- observed frequency of each class

returns chi-squared test statistic

Function Overloads

D float32 chiSquared(D uint32[[1]] observed)

D float64 chiSquared(D uint64[[1]] observed)

chiSquared(with codebook)

Perform Pearson’s chi-squared test of independence.

Detailed Description

D - any protection domain

Supported types - uint32 / uint64

This version does not do continuity correction so the R equivalent is chisq.test(contingencyTable, correct=FALSE)

Parameters

data

- input vector

cases

- vector indicating which elements of the input vector belong to the first sample

controls

- vector indicating which elements of the input vector belong to the second sample

codeBook

- matrix used for creating the contingency table. The first row contains expected values of the input vector and the second row contains the classes that these values will be put into. The classes should begin with 1.

returns the test statistic

None

Function Overloads

D float32 chiSquared(D uint32[[1]] data, D bool[[1]] cases, D bool[[1]] controls, uint32[[2]] codeBook)

D float64 chiSquared(D uint64[[1]] data, D bool[[1]] cases, D bool[[1]] controls, uint64[[2]] codeBook)

combinedDegreesOfFreedom

Approximate the degrees of freedom of a linear combination of independent sample variances.

Detailed Description

Uses the Welch-Satterthwaite equation. It’s useful for calculating the degrees of freedom when performing a t-test on samples with unequal variances (Welch’s t-test).

D - any protection domain

Supported types - int32 / int64 / float32 / float64

Parameters

data1

- first sample

ia1

- vector indicating which elements of the first sample are available

data2

- second sample

ia2

- vector indicating which elements of the second sample are available

returns the approximated number of degrees of freedom

Leaks the number of true values in ia1 and ia2

Function Overloads

D float32 combinedDegreesOfFreedom(D int32[[1]] data1, D bool[[1]] ia1, D int32[[1]] data2, D bool[[1]] ia2)

D float64 combinedDegreesOfFreedom(D int64[[1]] data1, D bool[[1]] ia1, D int64[[1]] data2, D bool[[1]] ia2)

D float32 combinedDegreesOfFreedom(D float32[[1]] data1, D bool[[1]] ia1, D float32[[1]] data2, D bool[[1]] ia2)

D float64 combinedDegreesOfFreedom(D float64[[1]] data1, D bool[[1]] ia1, D float64[[1]] data2, D bool[[1]] ia2)

constants

Constants used for specifying the alternative hypothesis.

Constants

int64

ALTERNATIVE_LESS = 0

int64

ALTERNATIVE_GREATER = 1

int64

ALTERNATIVE_TWO_SIDED = 2

mannWhitneyU

Perform Mann-Whitney U test.

Detailed Description

D - shared3p protection domain

Supported types - int32 / int64 / float32 / float64

The t-test requires the populations to be normally distributed. If the populations cannot be assumed to be normally distributed but are ordinal then the Mann-Whitney U or Wilcoxon rank-sum test can be used instead.

Parameters

sample1

- first sample

ia1

- vector indicating which elements of the first sample are available

sample2

- second sample

ia2

- vector indicating which elements of the second sample are available

correctRanks

- indicates if the equal sample values should be ranked correctly. If they are not the test is more conservative but faster.

alternative

- the type of alternative hypothesis. Less - mean of sample1 is less than mean of sample2, greater - mean of sample1 is greater than mean of sample2, two-sided - means of sample1 and sample2 are different

returns a vector where the first element is the test statistic and the second element is the z-score. The z-score is continuity corrected. The z-score is an approximation and is only correct when the samples are large and the significance level is not very low.

Leaks the sum of the number of true values in ia1 and ia2

Function Overloads

D float32 mannWhitneyU(D int32[[1]] sample1, D bool[[1]] ia1, D int32[[1]] sample2, D bool[[1]] ia2, bool correctRanks, int64 alternative)

D float64 mannWhitneyU(D int64[[1]] sample1, D bool[[1]] ia1, D int64[[1]] sample2, D bool[[1]] ia2, bool correctRanks, int64 alternative)

D float32 mannWhitneyU(D float32[[1]] sample1, D bool[[1]] ia1, D float32[[1]] sample2, D bool[[1]] ia2, bool correctRanks, int64 alternative)

D float64 mannWhitneyU(D float64[[1]] sample1, D bool[[1]] ia1, D float64[[1]] sample2, D bool[[1]] ia2, bool correctRanks, int64 alternative)

multipleTesting

Perform the Benjamini-Hochberg procedure for false discovery rate control.

Detailed Description

If multiple variables of a dataset are used for testing a hypothesis then this procedure will help avoid false discoveries (incorrectly rejected null hypothesis). This procedure does comparisons of the form t >= q where t is the test statistic and q is a critical quantile. This means it can only be used for tests that have that form of comparison (upper tail). For example, when using the "greater" hypothesis t-test which checks if the mean of the first distribution exceeds the mean of the second.

D - shared3p protection domain

Parameters

statistics

- vector of calculated test statistics

quantiles

- vector of critical values of the test statistics. The i-th value should be Q(i / k ⋅ alpha) where k is the number of tests, alpha is the significance level and Q is the quantile function of the distribution of the test statistic. Note that you have to use the quantile function correctly depending on your test. For example, when using the "greater" hypothesis t-test, the decision is 1 - P(t) < alpha which means the quantiles should be calculated as 1 - Q(p), not Q(p).

returns the indices of the tests for which the null hypothesis is rejected

Leaks the ordering of significant test statistics (but not the statistics)

Function Overloads

D uint multipleTesting(D float32[[1]] statistics, float32[[1]] quantiles)

D uint multipleTesting(D float64[[1]] statistics, float64[[1]] quantiles)

pairedTTest

Perform paired t-tests.

Detailed Description

D - shared3p protection domain

Supported types - int32 / int64 / float32 / float64

Parameters

sample1

- first sample

sample2

- second sample

filter

- vector indicating which elements of the sample to include in computing the t value

constant

- hypothesized difference of means (set to 0 if testing for equal means)

returns the t-value

None

Function Overloads

D float32 pairedTTest(D int32[[1]] sample1, D int32[[1]] sample2, D bool[[1]] filter, float32 constant)

D float64 pairedTTest(D int64[[1]] sample1, D int64[[1]] sample2, D bool[[1]] filter, float64 constant)

D float32 pairedTTest(D float32[[1]] sample1, D float32[[1]] sample2, D bool[[1]] filter, float32 constant)

D float64 pairedTTest(D float64[[1]] sample1, D float64[[1]] sample2, D bool[[1]] filter, float64 constant)

tTest

Perform t-tests.

Detailed Description

D - shared3p protection domain

Supported types - int32 / int64 / float32 / float64

Parameters

data

- input vector

cases

- vector indicating which elements of the input vector belong to the first sample

controls

- vector indicating which elements of the input vector belong to the second sample

variancesEqual

- indicates if the variances of the two samples should be treated as equal

returns the test statistic

None

Function Overloads

D float32 tTest(D int32[[1]] data, D bool[[1]] cases, D bool[[1]] controls, bool variancesEqual)

D float64 tTest(D int64[[1]] data, D bool[[1]] cases, D bool[[1]] controls, bool variancesEqual)

D float32 tTest(D float32[[1]] data, D bool[[1]] cases, D bool[[1]] controls, bool variancesEqual)

D float64 tTest(D float64[[1]] data, D bool[[1]] cases, D bool[[1]] controls, bool variancesEqual)

tTest(two sample vectors)

Perform t-tests.

Detailed Description

D - shared3p protection domain

Supported types - int32 / int64 / float32 / float64

Parameters

data1

- first sample

ia1

- vector indicating which elements of the first sample are available

data2

- second sample

ia2

- vector indicating which elements of the second sample are available

variancesEqual

- indicates if the variances of the two samples should be treated as equal

returns the test statistic

Leaks the number of true values in ia1 and ia2

Function Overloads

D float32 tTest(D int32[[1]] data1, D bool[[1]] ia1, D int32[[1]] data2, D bool[[1]] ia2, bool variancesEqual)

D float64 tTest(D int64[[1]] data1, D bool[[1]] ia1, D int64[[1]] data2, D bool[[1]] ia2, bool variancesEqual)

D float32 tTest(D float32[[1]] data1, D bool[[1]] ia1, D float32[[1]] data2, D bool[[1]] ia2, bool variancesEqual)

D float64 tTest(D float64[[1]] data1, D bool[[1]] ia1, D float64[[1]] data2, D bool[[1]] ia2, bool variancesEqual)

wilcoxonRankSum

Perform Wilcoxon rank sum tests.

Detailed Description

D - shared3p protection domain

Supported types - int32 / int64

The t-test requires the populations to be normally distributed. If the populations cannot be assumed to be normally distributed but are ordinal then the Mann-Whitney U or Wilcoxon rank-sum test can be used instead.

Parameters

sample1

- first sample

ia1

- vector indicating which elements of the first sample are available

sample2

- second sample

ia2

- vector indicating which elements of the second sample are available

correctRanks

- indicates if the equal sample values should be ranked correctly. If they are not the test is more conservative but faster.

alternative

- the type of alternative hypothesis. Less - mean of sample1 is less than mean of sample2, greater - mean of sample1 is greater than mean of sample2, two-sided - means of sample1 and sample2 are different

returns a vector where the first element is the test statistic and the second element is the z-score. The z-score is continuity corrected. The z-score is an approximation and its values are only correct when both samples have at least 5 elements.

Leaks the sum of the number of true values in ia1 and ia2

Function Overloads

D float32 wilcoxonRankSum(D int32[[1]] sample1, D bool[[1]] ia1, D int32[[1]] sample2, D bool[[1]] ia2, bool correctRanks, int64 alternative)

D float64 wilcoxonRankSum(D int64[[1]] sample1, D bool[[1]] ia1, D int64[[1]] sample2, D bool[[1]] ia2, bool correctRanks, int64 alternative)

D float32 wilcoxonRankSum(D float32[[1]] sample1, D bool[[1]] ia1, D float32[[1]] sample2, D bool[[1]] ia2, bool correctRanks, int64 alternative)

D float64 wilcoxonRankSum(D float64[[1]] sample1, D bool[[1]] ia1, D float64[[1]] sample2, D bool[[1]] ia2, bool correctRanks, int64 alternative)

wilcoxonSignedRank

Perform Wilcoxon signed rank tests.

Detailed Description

The paired t-test requires the populations to be normally distributed. If the populations cannot be assumed to be normally distributed but are ordinal then the Wilcoxon signed rank test can be used instead.

D - shared3p protection domain

Supported types - int32 / int64 / float32 / float64

Parameters

sample1

- first sample

sample2

- second sample

filter

- vector indicating which elements of the sample to include in computing the statistic

correctRanks

- indicates if the equal sample values should be ranked correctly. If they are not the test is more conservative but faster.

alternative

- the type of alternative hypothesis. Less - mean of sample1 is less than mean of sample2, greater - mean of sample1 is greater than mean of sample2, two-sided - means of sample1 and sample2 are different

returns a vector where the first element is the test statistic and the second element is the z-score. The z-score is continuity corrected. The z-score is an approximation and when there’s less than 10 pairs with non-zero difference, it’s incorrect.

Leaks the number of true values in filter minus the number of pairs where the difference is zero

Function Overloads

D float32 wilcoxonSignedRank(D int32[[1]] sample1, D int32[[1]] sample2, D bool[[1]] filter, bool correctRanks, int64 alternative)

D float64 wilcoxonSignedRank(D int64[[1]] sample1, D int64[[1]] sample2, D bool[[1]] filter, bool correctRanks, int64 alternative)

D float32 wilcoxonSignedRank(D float32[[1]] sample1, D float32[[1]] sample2, D bool[[1]] filter, bool correctRanks, int64 alternative)

D float64 wilcoxonSignedRank(D float64[[1]] sample1, D float64[[1]] sample2, D bool[[1]] filter, bool correctRanks, int64 alternative)