shared3p_statistics_testing.sc
shared3p_statistics_testing.sc
Module with statistical hypothesis tests.
Functions:
chiSquared
Perform Pearson’s chisquared test of independence.
Detailed Description
D  any protection domain 
Supported types  int32 / int64 
This version does not do continuity correction so the R equivalent is chisq.test(contingencyTable, correct=FALSE) 
Parameters
contTable 
 contingency table in the format [%noheader] 
  Cases  Controls  Option 1  c1  d1  Option 2  c2  d2  Option 3  c3  d3  …  …  …
returns the test statistic 
None 
chiSquared(goodness of fit with probabilities)
Perform Pearson’s chisquared goodness of fit test.
chiSquared(goodness of fit)
Perform Pearson’s chisquared goodness of fit test.
chiSquared(with codebook)
Perform Pearson’s chisquared test of independence.
Detailed Description
D  any protection domain 
Supported types  uint32 / uint64 
This version does not do continuity correction so the R equivalent is chisq.test(contingencyTable, correct=FALSE) 
Parameters
data 
 input vector 
cases 
 vector indicating which elements of the input vector belong to the first sample 
controls 
 vector indicating which elements of the input vector belong to the second sample 
codeBook 
 matrix used for creating the contingency table. The first row contains expected values of the input vector and the second row contains the classes that these values will be put into. The classes should begin with 1. 
returns the test statistic 
None 
combinedDegreesOfFreedom
Approximate the degrees of freedom of a linear combination of independent sample variances.
Detailed Description
Uses the WelchSatterthwaite equation. It’s useful for calculating the degrees of freedom when performing a ttest on samples with unequal variances (Welch’s ttest). 
D  any protection domain 
Supported types  int32 / int64 / float32 / float64 
Parameters
data1 
 first sample 
ia1 
 vector indicating which elements of the first sample are available 
data2 
 second sample 
ia2 
 vector indicating which elements of the second sample are available 
returns the approximated number of degrees of freedom 
Leaks the number of true values in ia1 and ia2 
Function Overloads
D float32 combinedDegreesOfFreedom(D int32[[1]] data1, D bool[[1]] ia1, D int32[[1]] data2, D bool[[1]] ia2)
D float64 combinedDegreesOfFreedom(D int64[[1]] data1, D bool[[1]] ia1, D int64[[1]] data2, D bool[[1]] ia2)
D float32 combinedDegreesOfFreedom(D float32[[1]] data1, D bool[[1]] ia1, D float32[[1]] data2, D bool[[1]] ia2)
D float64 combinedDegreesOfFreedom(D float64[[1]] data1, D bool[[1]] ia1, D float64[[1]] data2, D bool[[1]] ia2)
constants
Constants used for specifying the alternative hypothesis.
Constants






mannWhitneyU
Perform MannWhitney U test.
Detailed Description
D  shared3p protection domain 
Supported types  int32 / int64 / float32 / float64 
The ttest requires the populations to be normally distributed. If the populations cannot be assumed to be normally distributed but are ordinal then the MannWhitney U or Wilcoxon ranksum test can be used instead. 
Parameters
sample1 
 first sample 
ia1 
 vector indicating which elements of the first sample are available 
sample2 
 second sample 
ia2 
 vector indicating which elements of the second sample are available 
correctRanks 
 indicates if the equal sample values should be ranked correctly. If they are not the test is more conservative but faster. 
alternative 
 the type of alternative hypothesis. Less  mean of sample1 is less than mean of sample2, greater  mean of sample1 is greater than mean of sample2, twosided  means of sample1 and sample2 are different 
returns a vector where the first element is the test statistic and the second element is the zscore. The zscore is continuity corrected. The zscore is an approximation and is only correct when the samples are large and the significance level is not very low. 
Leaks the sum of the number of true values in ia1 and ia2 
Function Overloads
D float32 mannWhitneyU(D int32[[1]] sample1, D bool[[1]] ia1, D int32[[1]] sample2, D bool[[1]] ia2, bool correctRanks, int64 alternative)
D float64 mannWhitneyU(D int64[[1]] sample1, D bool[[1]] ia1, D int64[[1]] sample2, D bool[[1]] ia2, bool correctRanks, int64 alternative)
D float32 mannWhitneyU(D float32[[1]] sample1, D bool[[1]] ia1, D float32[[1]] sample2, D bool[[1]] ia2, bool correctRanks, int64 alternative)
D float64 mannWhitneyU(D float64[[1]] sample1, D bool[[1]] ia1, D float64[[1]] sample2, D bool[[1]] ia2, bool correctRanks, int64 alternative)
multipleTesting
Perform the BenjaminiHochberg procedure for false discovery rate control.
Detailed Description
If multiple variables of a dataset are used for testing a hypothesis then this procedure will help avoid false discoveries (incorrectly rejected null hypothesis). This procedure does comparisons of the form t >= q where t is the test statistic and q is a critical quantile. This means it can only be used for tests that have that form of comparison (upper tail). For example, when using the "greater" hypothesis ttest which checks if the mean of the first distribution exceeds the mean of the second. 
D  shared3p protection domain 
Parameters
statistics 
 vector of calculated test statistics 
quantiles 
 vector of critical values of the test statistics. The ith value should be Q(i / k ⋅ alpha) where k is the number of tests, alpha is the significance level and Q is the quantile function of the distribution of the test statistic. Note that you have to use the quantile function correctly depending on your test. For example, when using the "greater" hypothesis ttest, the decision is 1  P(t) < alpha which means the quantiles should be calculated as 1  Q(p), not Q(p). 
returns the indices of the tests for which the null hypothesis is rejected 
Leaks the ordering of significant test statistics (but not the statistics) 
pairedTTest
Perform paired ttests.
Detailed Description
D  shared3p protection domain 
Supported types  int32 / int64 / float32 / float64 
Parameters
sample1 
 first sample 
sample2 
 second sample 
filter 
 vector indicating which elements of the sample to include in computing the t value 
constant 
 hypothesized difference of means (set to 0 if testing for equal means) 
returns the tvalue 
None 
Function Overloads
D float32 pairedTTest(D int32[[1]] sample1, D int32[[1]] sample2, D bool[[1]] filter, float32 constant)
D float64 pairedTTest(D int64[[1]] sample1, D int64[[1]] sample2, D bool[[1]] filter, float64 constant)
D float32 pairedTTest(D float32[[1]] sample1, D float32[[1]] sample2, D bool[[1]] filter, float32 constant)
D float64 pairedTTest(D float64[[1]] sample1, D float64[[1]] sample2, D bool[[1]] filter, float64 constant)
tTest
Perform ttests.
Detailed Description
D  shared3p protection domain 
Supported types  int32 / int64 / float32 / float64 
Parameters
data 
 input vector 
cases 
 vector indicating which elements of the input vector belong to the first sample 
controls 
 vector indicating which elements of the input vector belong to the second sample 
variancesEqual 
 indicates if the variances of the two samples should be treated as equal 
returns the test statistic 
None 
Function Overloads
D float32 tTest(D int32[[1]] data, D bool[[1]] cases, D bool[[1]] controls, bool variancesEqual)
D float64 tTest(D int64[[1]] data, D bool[[1]] cases, D bool[[1]] controls, bool variancesEqual)
D float32 tTest(D float32[[1]] data, D bool[[1]] cases, D bool[[1]] controls, bool variancesEqual)
D float64 tTest(D float64[[1]] data, D bool[[1]] cases, D bool[[1]] controls, bool variancesEqual)
tTest(two sample vectors)
Perform ttests.
Detailed Description
D  shared3p protection domain 
Supported types  int32 / int64 / float32 / float64 
Parameters
data1 
 first sample 
ia1 
 vector indicating which elements of the first sample are available 
data2 
 second sample 
ia2 
 vector indicating which elements of the second sample are available 
variancesEqual 
 indicates if the variances of the two samples should be treated as equal 
returns the test statistic 
Leaks the number of true values in ia1 and ia2 
Function Overloads
D float32 tTest(D int32[[1]] data1, D bool[[1]] ia1, D int32[[1]] data2, D bool[[1]] ia2, bool variancesEqual)
D float64 tTest(D int64[[1]] data1, D bool[[1]] ia1, D int64[[1]] data2, D bool[[1]] ia2, bool variancesEqual)
D float32 tTest(D float32[[1]] data1, D bool[[1]] ia1, D float32[[1]] data2, D bool[[1]] ia2, bool variancesEqual)
D float64 tTest(D float64[[1]] data1, D bool[[1]] ia1, D float64[[1]] data2, D bool[[1]] ia2, bool variancesEqual)
wilcoxonRankSum
Perform Wilcoxon rank sum tests.
Detailed Description
D  shared3p protection domain 
Supported types  int32 / int64 
The ttest requires the populations to be normally distributed. If the populations cannot be assumed to be normally distributed but are ordinal then the MannWhitney U or Wilcoxon ranksum test can be used instead. 
Parameters
sample1 
 first sample 
ia1 
 vector indicating which elements of the first sample are available 
sample2 
 second sample 
ia2 
 vector indicating which elements of the second sample are available 
correctRanks 
 indicates if the equal sample values should be ranked correctly. If they are not the test is more conservative but faster. 
alternative 
 the type of alternative hypothesis. Less  mean of sample1 is less than mean of sample2, greater  mean of sample1 is greater than mean of sample2, twosided  means of sample1 and sample2 are different 
returns a vector where the first element is the test statistic and the second element is the zscore. The zscore is continuity corrected. The zscore is an approximation and its values are only correct when both samples have at least 5 elements. 
Leaks the sum of the number of true values in ia1 and ia2 
Function Overloads
D float32 wilcoxonRankSum(D int32[[1]] sample1, D bool[[1]] ia1, D int32[[1]] sample2, D bool[[1]] ia2, bool correctRanks, int64 alternative)
D float64 wilcoxonRankSum(D int64[[1]] sample1, D bool[[1]] ia1, D int64[[1]] sample2, D bool[[1]] ia2, bool correctRanks, int64 alternative)
D float32 wilcoxonRankSum(D float32[[1]] sample1, D bool[[1]] ia1, D float32[[1]] sample2, D bool[[1]] ia2, bool correctRanks, int64 alternative)
D float64 wilcoxonRankSum(D float64[[1]] sample1, D bool[[1]] ia1, D float64[[1]] sample2, D bool[[1]] ia2, bool correctRanks, int64 alternative)
wilcoxonSignedRank
Perform Wilcoxon signed rank tests.
Detailed Description
The paired ttest requires the populations to be normally distributed. If the populations cannot be assumed to be normally distributed but are ordinal then the Wilcoxon signed rank test can be used instead. 
D  shared3p protection domain 
Supported types  int32 / int64 / float32 / float64 
Parameters
sample1 
 first sample 
sample2 
 second sample 
filter 
 vector indicating which elements of the sample to include in computing the statistic 
correctRanks 
 indicates if the equal sample values should be ranked correctly. If they are not the test is more conservative but faster. 
alternative 
 the type of alternative hypothesis. Less  mean of sample1 is less than mean of sample2, greater  mean of sample1 is greater than mean of sample2, twosided  means of sample1 and sample2 are different 
returns a vector where the first element is the test statistic and the second element is the zscore. The zscore is continuity corrected. The zscore is an approximation and when there’s less than 10 pairs with nonzero difference, it’s incorrect. 
Leaks the number of true values in filter minus the number of pairs where the difference is zero 
Function Overloads
D float32 wilcoxonSignedRank(D int32[[1]] sample1, D int32[[1]] sample2, D bool[[1]] filter, bool correctRanks, int64 alternative)
D float64 wilcoxonSignedRank(D int64[[1]] sample1, D int64[[1]] sample2, D bool[[1]] filter, bool correctRanks, int64 alternative)
D float32 wilcoxonSignedRank(D float32[[1]] sample1, D float32[[1]] sample2, D bool[[1]] filter, bool correctRanks, int64 alternative)
D float64 wilcoxonSignedRank(D float64[[1]] sample1, D float64[[1]] sample2, D bool[[1]] filter, bool correctRanks, int64 alternative)