shared3p_statistics_testing.sc
shared3p_statistics_testing.sc
Module with statistical hypothesis tests.
Functions:
chiSquared
Perform Pearson’s chi-squared test of independence.
Detailed Description
D - any protection domain |
Supported types - int32 / int64 |
This version does not do continuity correction so the R equivalent is chisq.test(contingencyTable, correct=FALSE) |
Parameters
contTable |
- contingency table in the format [%noheader] |
| | Cases | Controls | Option 1 | c1 | d1 | Option 2 | c2 | d2 | Option 3 | c3 | d3 | … | … | …
returns the test statistic |
None |
chiSquared(goodness of fit with probabilities)
Perform Pearson’s chi-squared goodness of fit test.
chiSquared(goodness of fit)
Perform Pearson’s chi-squared goodness of fit test.
chiSquared(with codebook)
Perform Pearson’s chi-squared test of independence.
Detailed Description
D - any protection domain |
Supported types - uint32 / uint64 |
This version does not do continuity correction so the R equivalent is chisq.test(contingencyTable, correct=FALSE) |
Parameters
data |
- input vector |
cases |
- vector indicating which elements of the input vector belong to the first sample |
controls |
- vector indicating which elements of the input vector belong to the second sample |
codeBook |
- matrix used for creating the contingency table. The first row contains expected values of the input vector and the second row contains the classes that these values will be put into. The classes should begin with 1. |
returns the test statistic |
None |
combinedDegreesOfFreedom
Approximate the degrees of freedom of a linear combination of independent sample variances.
Detailed Description
Uses the Welch-Satterthwaite equation. It’s useful for calculating the degrees of freedom when performing a t-test on samples with unequal variances (Welch’s t-test). |
D - any protection domain |
Supported types - int32 / int64 / float32 / float64 |
Parameters
data1 |
- first sample |
ia1 |
- vector indicating which elements of the first sample are available |
data2 |
- second sample |
ia2 |
- vector indicating which elements of the second sample are available |
returns the approximated number of degrees of freedom |
Leaks the number of true values in ia1 and ia2 |
Function Overloads
D float32 combinedDegreesOfFreedom(D int32[[1]] data1, D bool[[1]] ia1, D int32[[1]] data2, D bool[[1]] ia2)
D float64 combinedDegreesOfFreedom(D int64[[1]] data1, D bool[[1]] ia1, D int64[[1]] data2, D bool[[1]] ia2)
D float32 combinedDegreesOfFreedom(D float32[[1]] data1, D bool[[1]] ia1, D float32[[1]] data2, D bool[[1]] ia2)
D float64 combinedDegreesOfFreedom(D float64[[1]] data1, D bool[[1]] ia1, D float64[[1]] data2, D bool[[1]] ia2)
constants
Constants used for specifying the alternative hypothesis.
Constants
|
|
|
|
|
|
mannWhitneyU
Perform Mann-Whitney U test.
Detailed Description
D - shared3p protection domain |
Supported types - int32 / int64 / float32 / float64 |
The t-test requires the populations to be normally distributed. If the populations cannot be assumed to be normally distributed but are ordinal then the Mann-Whitney U or Wilcoxon rank-sum test can be used instead. |
Parameters
sample1 |
- first sample |
ia1 |
- vector indicating which elements of the first sample are available |
sample2 |
- second sample |
ia2 |
- vector indicating which elements of the second sample are available |
correctRanks |
- indicates if the equal sample values should be ranked correctly. If they are not the test is more conservative but faster. |
alternative |
- the type of alternative hypothesis. Less - mean of sample1 is less than mean of sample2, greater - mean of sample1 is greater than mean of sample2, two-sided - means of sample1 and sample2 are different |
returns a vector where the first element is the test statistic and the second element is the z-score. The z-score is continuity corrected. The z-score is an approximation and is only correct when the samples are large and the significance level is not very low. |
Leaks the sum of the number of true values in ia1 and ia2 |
Function Overloads
D float32 mannWhitneyU(D int32[[1]] sample1, D bool[[1]] ia1, D int32[[1]] sample2, D bool[[1]] ia2, bool correctRanks, int64 alternative)
D float64 mannWhitneyU(D int64[[1]] sample1, D bool[[1]] ia1, D int64[[1]] sample2, D bool[[1]] ia2, bool correctRanks, int64 alternative)
D float32 mannWhitneyU(D float32[[1]] sample1, D bool[[1]] ia1, D float32[[1]] sample2, D bool[[1]] ia2, bool correctRanks, int64 alternative)
D float64 mannWhitneyU(D float64[[1]] sample1, D bool[[1]] ia1, D float64[[1]] sample2, D bool[[1]] ia2, bool correctRanks, int64 alternative)
multipleTesting
Perform the Benjamini-Hochberg procedure for false discovery rate control.
Detailed Description
If multiple variables of a dataset are used for testing a hypothesis then this procedure will help avoid false discoveries (incorrectly rejected null hypothesis). This procedure does comparisons of the form t >= q where t is the test statistic and q is a critical quantile. This means it can only be used for tests that have that form of comparison (upper tail). For example, when using the "greater" hypothesis t-test which checks if the mean of the first distribution exceeds the mean of the second. |
D - shared3p protection domain |
Parameters
statistics |
- vector of calculated test statistics |
quantiles |
- vector of critical values of the test statistics. The i-th value should be Q(i / k ⋅ alpha) where k is the number of tests, alpha is the significance level and Q is the quantile function of the distribution of the test statistic. Note that you have to use the quantile function correctly depending on your test. For example, when using the "greater" hypothesis t-test, the decision is 1 - P(t) < alpha which means the quantiles should be calculated as 1 - Q(p), not Q(p). |
returns the indices of the tests for which the null hypothesis is rejected |
Leaks the ordering of significant test statistics (but not the statistics) |
pairedTTest
Perform paired t-tests.
Detailed Description
D - shared3p protection domain |
Supported types - int32 / int64 / float32 / float64 |
Parameters
sample1 |
- first sample |
sample2 |
- second sample |
filter |
- vector indicating which elements of the sample to include in computing the t value |
constant |
- hypothesized difference of means (set to 0 if testing for equal means) |
returns the t-value |
None |
Function Overloads
D float32 pairedTTest(D int32[[1]] sample1, D int32[[1]] sample2, D bool[[1]] filter, float32 constant)
D float64 pairedTTest(D int64[[1]] sample1, D int64[[1]] sample2, D bool[[1]] filter, float64 constant)
D float32 pairedTTest(D float32[[1]] sample1, D float32[[1]] sample2, D bool[[1]] filter, float32 constant)
D float64 pairedTTest(D float64[[1]] sample1, D float64[[1]] sample2, D bool[[1]] filter, float64 constant)
tTest
Perform t-tests.
Detailed Description
D - shared3p protection domain |
Supported types - int32 / int64 / float32 / float64 |
Parameters
data |
- input vector |
cases |
- vector indicating which elements of the input vector belong to the first sample |
controls |
- vector indicating which elements of the input vector belong to the second sample |
variancesEqual |
- indicates if the variances of the two samples should be treated as equal |
returns the test statistic |
None |
Function Overloads
D float32 tTest(D int32[[1]] data, D bool[[1]] cases, D bool[[1]] controls, bool variancesEqual)
D float64 tTest(D int64[[1]] data, D bool[[1]] cases, D bool[[1]] controls, bool variancesEqual)
D float32 tTest(D float32[[1]] data, D bool[[1]] cases, D bool[[1]] controls, bool variancesEqual)
D float64 tTest(D float64[[1]] data, D bool[[1]] cases, D bool[[1]] controls, bool variancesEqual)
tTest(two sample vectors)
Perform t-tests.
Detailed Description
D - shared3p protection domain |
Supported types - int32 / int64 / float32 / float64 |
Parameters
data1 |
- first sample |
ia1 |
- vector indicating which elements of the first sample are available |
data2 |
- second sample |
ia2 |
- vector indicating which elements of the second sample are available |
variancesEqual |
- indicates if the variances of the two samples should be treated as equal |
returns the test statistic |
Leaks the number of true values in ia1 and ia2 |
Function Overloads
D float32 tTest(D int32[[1]] data1, D bool[[1]] ia1, D int32[[1]] data2, D bool[[1]] ia2, bool variancesEqual)
D float64 tTest(D int64[[1]] data1, D bool[[1]] ia1, D int64[[1]] data2, D bool[[1]] ia2, bool variancesEqual)
D float32 tTest(D float32[[1]] data1, D bool[[1]] ia1, D float32[[1]] data2, D bool[[1]] ia2, bool variancesEqual)
D float64 tTest(D float64[[1]] data1, D bool[[1]] ia1, D float64[[1]] data2, D bool[[1]] ia2, bool variancesEqual)
wilcoxonRankSum
Perform Wilcoxon rank sum tests.
Detailed Description
D - shared3p protection domain |
Supported types - int32 / int64 |
The t-test requires the populations to be normally distributed. If the populations cannot be assumed to be normally distributed but are ordinal then the Mann-Whitney U or Wilcoxon rank-sum test can be used instead. |
Parameters
sample1 |
- first sample |
ia1 |
- vector indicating which elements of the first sample are available |
sample2 |
- second sample |
ia2 |
- vector indicating which elements of the second sample are available |
correctRanks |
- indicates if the equal sample values should be ranked correctly. If they are not the test is more conservative but faster. |
alternative |
- the type of alternative hypothesis. Less - mean of sample1 is less than mean of sample2, greater - mean of sample1 is greater than mean of sample2, two-sided - means of sample1 and sample2 are different |
returns a vector where the first element is the test statistic and the second element is the z-score. The z-score is continuity corrected. The z-score is an approximation and its values are only correct when both samples have at least 5 elements. |
Leaks the sum of the number of true values in ia1 and ia2 |
Function Overloads
D float32 wilcoxonRankSum(D int32[[1]] sample1, D bool[[1]] ia1, D int32[[1]] sample2, D bool[[1]] ia2, bool correctRanks, int64 alternative)
D float64 wilcoxonRankSum(D int64[[1]] sample1, D bool[[1]] ia1, D int64[[1]] sample2, D bool[[1]] ia2, bool correctRanks, int64 alternative)
D float32 wilcoxonRankSum(D float32[[1]] sample1, D bool[[1]] ia1, D float32[[1]] sample2, D bool[[1]] ia2, bool correctRanks, int64 alternative)
D float64 wilcoxonRankSum(D float64[[1]] sample1, D bool[[1]] ia1, D float64[[1]] sample2, D bool[[1]] ia2, bool correctRanks, int64 alternative)
wilcoxonSignedRank
Perform Wilcoxon signed rank tests.
Detailed Description
The paired t-test requires the populations to be normally distributed. If the populations cannot be assumed to be normally distributed but are ordinal then the Wilcoxon signed rank test can be used instead. |
D - shared3p protection domain |
Supported types - int32 / int64 / float32 / float64 |
Parameters
sample1 |
- first sample |
sample2 |
- second sample |
filter |
- vector indicating which elements of the sample to include in computing the statistic |
correctRanks |
- indicates if the equal sample values should be ranked correctly. If they are not the test is more conservative but faster. |
alternative |
- the type of alternative hypothesis. Less - mean of sample1 is less than mean of sample2, greater - mean of sample1 is greater than mean of sample2, two-sided - means of sample1 and sample2 are different |
returns a vector where the first element is the test statistic and the second element is the z-score. The z-score is continuity corrected. The z-score is an approximation and when there’s less than 10 pairs with non-zero difference, it’s incorrect. |
Leaks the number of true values in filter minus the number of pairs where the difference is zero |
Function Overloads
D float32 wilcoxonSignedRank(D int32[[1]] sample1, D int32[[1]] sample2, D bool[[1]] filter, bool correctRanks, int64 alternative)
D float64 wilcoxonSignedRank(D int64[[1]] sample1, D int64[[1]] sample2, D bool[[1]] filter, bool correctRanks, int64 alternative)
D float32 wilcoxonSignedRank(D float32[[1]] sample1, D float32[[1]] sample2, D bool[[1]] filter, bool correctRanks, int64 alternative)
D float64 wilcoxonSignedRank(D float64[[1]] sample1, D float64[[1]] sample2, D bool[[1]] filter, bool correctRanks, int64 alternative)