aggregate
Description
Compute summaries of subsets of data. The result table will also contain the grouped key columns. If the ds and table arguments are not specified a temporary table is created.
Note that the “count” function does not require a data vector. Average is not supported on logical vectors. Average and sum are not supported on bitwise-shared (xor) types.
Note that the order of rows in the output is not deterministic.
Usage
aggregate(key.columns, data.columns, functions)
aggregate(key.columns, data.columns, functions, ds=NULL, table=NULL, names=NULL, overwrite=FALSE)
Arguments
by |
a private vector or a list of private vectors. The groups will be formed based on these vectors. The vectors must have an integral type. |
data |
a list of private vectors. Summaries will be computed from these vectors. |
funs |
a list of strings with names of the functions that will be used to compute summaries. The possible values are “avg” (average), “count” (counts the number of values in a group), “first”, “last”, “max”, “min”, “sum”, “all” (conjunction of all values), “any” (disjunction of all values). |
ds |
(optional) name of the data source of the result table |
table |
(optional) name of the result table |
names |
a list of column names of the result table |
overwrite |
whether to overwrite table with the name given in argument
|
boxplot
cbind
Description
Concatenate tables by columns. If the ds and table arguments are not given, a temporary table is created.
Usage
cbind(table1, table2)
cbind(table1, table2, result.ds=NULL, result.table=NULL, result.overwrite=FALSE)
Arguments
… |
positional arguments are tables or lists of columns designating tables or keyword arguments of private vectors where the keyword names the new column. |
result.ds |
(optional) data source of the result table |
result.table |
(optional) name of the result table |
result.overwrite |
whether to overwrite table with the name given in argument “result.table” |
chisq.test
Description
Perform Pearson’s chi-squared test of independence or goodness of fit test.
The x
and y
arguments are data vectors. Note that booleans and
signed integers will be converted to an unsigned integral type so the
data should not contain negative values.
If only the first input is supplied it is handled as a vector of
frequencies and a goodness of fit test is performed. You can supply the
expected probabilities as the p
argument.
If both x
and y
are supplied they are handled as factors and
cross tabulated. Pearson’s test of independence is performed on the
contingency table. If the inputs are integers not factors then you must
specify the possible levels as public vectors using the xlevels
and
ylevels
arguments.
The test returns TRUE
if the null hypothesis is rejected.
ctree.importance
Description
Returns [information gain, important feature, its threshold] for each node.
For example, when the max.depth
in ctree()
is three, this
function returns a list with seven elements. The number of elements
(nodes) is decided as 2 ^ max.depth - 1
(binary tree). Node number
is based on breadth-first search and node1
is the root.
ctree
Description
Fits decision tree classifier.
The function returns a list with six elements. The elements are:
-
algo
- this function name; -
coefficients
- a private matrix indicating variable importance. To get each node information, usectree.importance()
; -
max.depth
- a number indicating the maximum depth of the tree; -
min.label
- a number indicating the minimum label, needed for prediction; -
ml.type
- a string indicating the machine learning type, needed for prediction; -
variable.names
- variable names corresponding but without dependent variable;
Arguments
model |
linear model formula. Scaling is not necessary. |
max.depth |
a number indicating the maximum depth of the tree |
date
dlmConfidenceInterval
Description
Returns the ceiling and floor values for confidence interval based on
given z
.
The function returns a named list with six elements. The elements are:
-
ceiling
- a vector indicating the ceiling mean values forz
-
floor
- a vector indicating the floor mean values forz
Arguments
mean |
a vector indicating the mean values |
var |
a vector indicating the variances |
z |
a number indicating the critical value |
Example
# for 95% confidence interval
filtered.mod <- dlmFilter(y, dlmModPoly(order=1))
result <- dlmConfidenceInterval(filtered.mod$m, filtered.mod$C, 1.96)
# plotting example
figure <- plot(
c(1: length(filtered.mod$m)),
list(
filtered.mod$m,
result$ceiling,
result$floor
),
title="Kalman Filter: local level",
type="l",
ratio=4
)
dlmFilter
Description
Returns the results of Kalman Filter for a DLM which is specified by
either dlmModPoly
or dlmModJoint
.
The function returns a named list.
When mod
is the result of dlmModPoly(order=1)
, the elements
are:
s * mod
- the updated DLM. A private matrix.
-
m
- public filtered values of the state vectors (local level) -
C
- public filtered values of the variances of the estimation errors
When mod
is the result of dlmModPoly(order=2)
, the elements are:
-
mod
- the updated DLM. Another named list with private vectors or matrices. -
m0
- public filtered values of the state vectors (for local level component) -
m1
- public filtered values of the state vectors (for local trend component) -
C0
- public filtered values of the variances of the estimation errors (for local level component) -
C1
- public filtered values of the variances of the estimation errors (for local trend component)
When mod
is the result of dlmModJoint
, the elements are:
-
mod
- the updated DLM. Another named list with private vectors or matrices. -
m0
- public filtered values of the state vectors (for local level component) -
m1
- public filtered values of the state vectors (for local trend component) -
m2
- public filtered values of the state vectors (for local cycle component) -
C0
- public filtered values of the variances of the estimation errors (for local level component) -
C1
- public filtered values of the variances of the estimation errors (for local trend component) -
C2
- public filtered values of the variances of the estimation errors (for local cycle component)
dlmForecast
Description
Returns the expected value and variance of future system states.
The function returns a named list.
When filtered mod
is based on dlmModPoly(order=1)
, the elements
are:
-
a
- public smoothed values of the expected future states -
R
- public smoothed values of the variances of the expected errors
When filtered mod
is based on the other DLM, the elements are:
-
a0
- public smoothed values of the expected future states (for local level component) -
a1
- public smoothed values of the expected future states (for local trend component) -
R0
- public smoothed values of the variances of the expected errors (for local level component) -
R1
- public smoothed values of the variances of the expected errors (for local trend component)
Arguments
mod | a filtered DLM (the result of dlmFilter ) |
---|---|
nAhead |
the number of steps ahead for which a forecast is requested. A positive integer. |
dlmLiuWestFilter
Description
Applying Liu-West filter with Rao-Blackwellzation for a DLM which is
specified by dlmModPoly(order=1)
.
The argument mod
needs to be the result of dlmModPoly(order=1)
,
but dV
and dW
values will be ingored.
The function returns a named list. The elements are:
-
m
- public filtered values of the state vectors (local level) -
C
- public filtered values of the variances of the estimation errors -
V
- estimated variance of the observation noise -
W
- estimated diagonal elements of the variance matrix of the system noise
When it is confirmed that the values V
and W
converge to a
constant values, it is recommended to set those values to
dlmModPoly()
and apply dlmFilter()
.
dlmModPoly
Description
Returns a Nth order polynomial DLM (Dynamic Linear Model).
The function returns a named list with six elements. The elements are:
-
order
- a number (N), either 1 or 2, indicating the order of the polynomial model; -
component
- a string indicating the component of the DLM; -
dV
- a number indicating the variance of the observation noise; -
dW
- a number or a double-type vector, indicating the diagonal elements of the variance matrix of the system noise; -
m0
- a number or a double-type vector, indicating the element(s) of the expected value of the pre-sample state vector; -
C0
- a number or a double-type vector, indicating the element(s) of the variance matrix of the pre-sample state vector;
Usage
dlmModPoly(1)
dlmModPoly(2)
dlmModPoly(order=1, dV=1, dW=1, m0=0, C0=1e7)
dlmModPoly(order=2, dV=1, dW=c(1, 0), m0=c(0, 0), C0=c(1e7, 1e7))
Arguments
order |
a number indicating the order of the polynomial model (either 1 or 2) |
dV |
a number indicating the variance of the observation noise |
dW |
a number or a double-type vector, indicating the diagonal elements of the variance matrix of the system noise |
m0 |
a number or a double-type vector, indicating the element(s) of the expected value of the pre-sample state vector |
C0 |
a number or a double-type vector, indicating the element(s) of the variance matrix of the pre-sample state vector |
dlmModSeas
Description
Returns a DLM representing a specified seasonal component.
The function returns a named list with six elements. The elements are:
-
order
- a number (N) indicating the order of the seasonal model (frequency
- 1); -
component
- a string indicating the component of the DLM; -
dV
- a number indicating the variance of the observation noise; -
dW
- a vector indicating the diagonal elements of the variance matrix of the system noise; -
m0
- a vector indicating the element(s) of the expected value of the pre-sample state vector; -
C0
- a vector indicating the element(s) of the variance matrix of the pre-sample state vector;
For filtering, use dlmModJoint
before dlmFilter
.
Usage
# no default values
dlmModSeas(4)
dlmModSeas(frequency=4, dV=1, dW=c(1, 0, 0), m0=c(0, 0, 0), C0=c(1e7, 1e7, 1e7))
Arguments
frequency | a number indicating the number of season |
---|---|
dV |
a number indicating the variance of the observation noise |
dW |
a vector indicating the diagonal elements of the variance matrix of the system noise |
m0 |
a vector indicating the element(s) of the expected value of the pre-sample state vector |
C0 |
a vector indicating the element(s) of the variance matrix of the pre-sample state vector |
dlmModTrig
Description
Returns a DLM representing a specified periodic component.
The function returns a named list with eight elements. The elements are:
-
order
- a number (N) indicating the order of the periodic model (2 -
q
- 1); -
component
- a string indicating the component of the DLM; -
s
- a number indicating the period; -
q
- a number indicating the number of harmonics in the DLM; -
dV
- a number indicating the variance of the observation noise; -
dW
- a number indicating the diagonal elements of the variance matrix of the system noise; -
m0
- a vector indicating the element(s) of the expected value of the pre-sample state vector; -
C0
- a vector indicating the element(s) of the variance matrix of the pre-sample state vector;
For filtering, use dlmModJoint
before dlmFilter
.
Usage
# no default values
dlmModTrig(4, 2)
dlmModTrig(s=4, q=2, dV=1, dW=0, m0=c(0, 0, 0), C0=c(1e7, 1e7, 1e7))
Arguments
s | a number indicating the period |
---|---|
q |
a number indicating the number of harmonics in the DLM. Must be an
even number and not exceed |
dV |
a number indicating the variance of the observation noise |
dW |
a number indicating the diagonal elements of the variance matrix of the system noise |
m0 |
a vector indicating the element(s) of the expected value of the pre-sample state vector |
C0 |
a vector indicating the element(s) of the variance matrix of the pre-sample state vector |
dlmSmooth
Description
Returns the results of Kalman Smoother using a filtered DLM.
The function returns a named list.
When filtered mod
is based on dlmModPoly(order=1)
, the elements
are:
-
s
- public smoothed values of the state vectors -
S
- public smoothed values of the variances of the smoothing errors
When filtered mod
is based on the other DLM, the elements are:
-
s0
- public smoothed values of the state vectors (for local level component) -
s1
- public smoothed values of the state vectors (for local trend component) -
S0
- public smoothed values of the variances of the smoothing errors (for local level component) -
S1
- public smoothed values of the variances of the smoothing errors (for local trend component)
factor
Description
Create a private factor vector from a list of public strings.
The mapping argument specifies how the string levels are mapped to integer codes. This is an optimisation used to speed up computations with factors in Sharemind. An example of the mapping argument would be list(a=1, b=2) when there are two levels (“a” and “b”). The codes must be sequential and start from 1.
You can get the factor mapping of an existing factor using levels
.
This allows you to create a new factor vector with the same mapping as
an existing one like this:
factor(list("a"), levels(x))
fft
Description
Calculates the Discrete Fourier Transform (DFT) with the Fast Fourier Transform (FFT) algorithm.
This function returns a named list with two elements: real
and
imag
. Both are vectors.
Warning: only a vector with “a power of 2” length is supported. If the input vector length is not a power of 2, this function slices at the index “the largest power of 2” (ex., if the length is 100, the function slices the vector at index 64).
fisher.test
Description
Perform Fisher’s exact test to test for association between two binary
variables. The numeric values 0 and 1 are always taken as the levels so
make sure to convert your inputs to this format. This is performed
implicitly when one of the inputs is a factor with two levels. Returns
TRUE
if the null hypothesis is rejected and FALSE
otherwise.
freq
freqplot
glm
Description
Fits generalized linear models. Only two types of models are currently
supported: regular linear regression (if family is “gaussian”) and
logistic regression (if family is “binomial-logit”). The documentation
of lm
describes the different methods of solving systems of linear
equations. The function returns a list with six elements. The elements
are:
-
algo
- this function name; -
coefficients
- fitted coefficients (the first element is the intercept); -
family
- a string indicating the model type; -
ml.type
- a string indicating the machine learning type, needed for prediction; -
model
- list of data vectors (the first one is the dependent variable vector); -
variable.names
- variable names corresponding tomodel
but without dependent variable;
Arguments
model | linear model formula |
---|---|
family |
a string indicating the distribution of the dependent variable and the type of link function (either “gaussian” or “binomial-logit”) |
iterations |
number of iterations of the algorithm |
sole.method |
a string indicating the method used for solving systems of linear equations (either “gauss”, “lu”, “conjugate-gradient” or “inversion”) |
sole.iterations |
if |
glmnet
Description
Fits generalized linear models with Dai-Liao non-linear conjugate
gradient optimization with the strong wolfe conditions. Supported
families are “gaussian”, “binomial-logit”, “gamma”, and
“poisson”. When family="binomial-logit"
, the dependent must be
either 0 or 1.
The function returns a list with six elements. The elements are:
-
algo
- this function name; -
coefficients
- fitted coefficients (the first element is the intercept); -
family
- a string indicating the model type; -
ml.type
- a string indicating the machine learning type, needed for prediction; -
model
- list of data vectors (the first one is the dependent variable vector); -
variable.names
- variable names corresponding tomodel
but without dependent variable;
Usage
glmnet(model)
glmnet(model, family="gaussian", iterations=10, wolfe.iterations=5,
alpha=1, lambda.lasso=0.1, lambda.ridge=0.1)
Arguments
model |
linear model formula. Applying scaled variables is highly recommended. |
family |
a string indicating the distribution of the dependent variable and the type of link function (either “gaussian”, “binomial-logit”, “gamma”, or “poisson” ) |
iterations |
number of iterations of the non-linear CG optimization. Start with a small number and increase gradually as needed. |
wolfe.iterations |
number of iterations of wolfe-conditions. Start with a small number and increase gradually as needed. |
alpha |
the elastic net mizing parameter. |
lambda.lasso |
the lasso regularization parameter |
lambda.ridge |
the ridge regularization parameter |
heatmap
Description
Plots a heatmap from private data. This is a replacement for X-Y plots in ordinary statistics software. The function takes two private vectors consisting of the x and y coordinates respectively. Instead of drawing each point separately, the number of points falling in an area is counted and a shade is assigned to the area.
hoslem.test
lapply
lm
Description
Perform linear regression. The methods used for solving systems of
linear equations are “lu” (LU decomposition), “gauss” (Gaussian
elimination), “conjugate-gradient” (Conjugate gradient method),
“inversion” (matrix inversion). The conjugate gradient method is fast
and accurate but accuracy depends on the number of iterations. Matrix
inversion is fast but is only implemented for less than four variables.
If the method argument is NULL
, either matrix inversion or LU
decomposition is chosen depending on the number of variables. The
function returns a vector with the estimated coefficients. The first
element is the intercept.
mann.whitney.test
Usage
mann.whitney.test(x, y)
mann.whitney.test(x, y, alternative="two.sided", significance=0.05, correct=TRUE)
Arguments
x | private vector of the first sample |
---|---|
y |
private vector of the second sample |
alternative |
a string indicating the type of the alternative hypothesis (either “two.sided”, “less” or “greater”) |
significance |
significance level |
correct |
a boolean indicating if tied ranks should be replaced with their average. This will be slower but less conservative. |
merge
Description
Merge two Sharemind database tables. The original tables remain
unchanged. If by
, by.x
and by.y
are unspecified the
procedure attempts to find a column with a common name. If multiple such
pairs exist, use the by
argument. If the key column is named
differently in each table, use by.x
and by.y
. To perform a cross
join give NULL
as the value of by
. Use a list of strings to
specify a multi-column key.
Note that the order of rows in the output is not deterministic.
Arguments
x | first Sharemind table |
---|---|
y |
second Sharemind table |
by |
name of the key column |
by.x |
name of the key column of the first table |
by.y |
name of the key column of the second table |
all |
all rows of both columns should be in the result table (full outer join) |
all.x |
all rows of the left column should be in the result table (left outer join) |
all.y |
all rows of the right column should be in the result table (right outer join) |
multiple.chisq.test
Description
Perform simultaneous chi-squared tests. Note that the data will be
converted to an unsigned integral type so it should not contain negative
values. The codebook is a list of two vectors. The first contains the
values expected in the input and the second contains values that they
are assigned to. This can be used to turn multiple categories into one.
If you do not wish to change categories, pass list(a:b, a:b)
as the
codebook argument, where a is the minimum and b the maximum expected
value. The Benjamini-Hochberg correction is less conservative but
slower. Returns a list with indices of the data sets for which the null
hypothesis was rejected.
Usage
multiple.chisq.test(dataList, filter, codebook)
multiple.chisq.test(dataList, filter, codebook, significance=0.05, method="benjamini-hochberg")
Arguments
x | list of private vectors. The vectors can not contain missing values. |
---|---|
filter |
a private filter vector. Should be a boolean vector or consist of zeroes and ones. This is used to divide each private vector into two groups. |
codebook |
codebook list |
significance |
significance level |
method |
correction method (either “bonferroni” or “benjamini-hochberg”) |
multiple.t.test
Description
Perform simultaneous t-tests. The tests are two-sided (the alternative hypothesis is that the means of the groups are different). The variance of the two groups are assumed to be equal. There’s no paired version. The Benjamini-Hochberg correction is less conservative but slower. Returns a list with indices of the data sets for which the null hypothesis was rejected.
Usage
multiple.t.test(dataList, filter)
multiple.t.test(dataList, filter, significance=0.05, method="benjamini-hochberg")
Arguments
x | list of private vectors. The vectors can not contain missing values. |
---|---|
filter |
a private filter vector. Should be a boolean vector or consist of zeroes and ones. This is used to divide each private vector into two groups. |
significance |
significance level |
method |
correction method (either “bonferroni” or “benjamini-hochberg”) |
multiplot
plot
Description
Produce an X-Y plot of public values.
The legend position consists of four components - vertical position (top, bottom or center), horizontal position (left, right or center), position in relation to the frame (inside or outside) and position of elements inside the legend (vertical or horizontal).
Usage
plot(x, y)
plot(x, y, title="", xlab="", ylab="", ylim=NULL, type=NULL, label.points=FALSE, data.labels=NULL, legend.pos="top right inside vertical", ratio=4/3)
Arguments
x | x axis coordinate vector (or list of vectors) |
---|---|
y |
y axis coordinate vector (or list of vectors) |
title |
plot title |
xlab |
x axis label |
ylab |
y axis label |
ylim |
a two-element vector specifying the minimum and maximum elements of the y axis |
type |
a string indicating the type of the plot (either “p” for points, “l” for lines or “b” for both) |
label.points |
a boolean indicating if points should have labels |
data.labels |
list of data set names |
legend.pos |
legend position |
ratio |
width/height ratio |
prcomp
Description
Perform principal component analysis. A list with three values is returned:
-
loads
is a list of vectors where the nth vector consists of the coordinates of the nth principal component; -
scores
is a private matrix consisting of the data transformed to the principal component space; -
residuals
is the residual matrix (difference between the data matrix and the matrix reconstructed from the principal components).
predict
Arguments
result | the result of a modeling function |
---|---|
data |
a Sharemind table with the same structure with the data used in a
modeling function, specified in the |
randomForest
Description
Fits random forest classifier.
The function returns a list with seven elements. The elements are:
-
algo
- this function name; -
coefficients
- a private matrix indicating variable importance. To get each node information, userandomForest.importance()
; -
max.depth
- a number indicating the maximum depth of the tree; -
min.label
- a number indicating the minimum label, needed for prediction; -
ml.type
- a string indicating the machine learning type, needed for prediction; -
ntree
- a number indicating the number of trees; -
variable.names
- variable names corresponding but without dependent variable;
Arguments
model |
linear model formula. Scaling is not necessary. |
max.depth |
a number indicating the maximum depth of each tree |
ntree |
a number indicating the number of the trees |
rbind
Description
Concatenate tables by row. If the ds and table arguments are not given, a temporary table is created.
Usage
rbind(table1, table2)
rbind(table1, table2, result.ds=NULL, result.table=NULL, result.overwrite=FALSE, result.names=NULL)
Arguments
… |
positional arguments are tables or lists of columns designating tables |
result.ds |
(optional) data source of the result table |
result.table |
(optional) name of the result table |
result.overwrite |
whether to overwrite table with the name given in
argument |
result.names |
a list of column names of the result table |
rm.missing
Arguments
x | a private vector or matrix |
---|---|
value |
an argument specifying the value that missing values are
replaced by. It can be a scalar or a list of scalars if the input is a
matrix in which case the list elements give replacement values for each
matrix column. If no value is specified, missing values will be replaced
by |
rm.outliers
Description
Remove outliers of a private vector. The length of the vector will remain the same. Outliers will be marked as not available. There are two methods - “quantiles” and “mad”. If the constant argument is x, “quantiles” will remove values below the (x * 100%) quantile and over the ((1 - x) * 100%) quantile. The default constant is 0.05. If the method is “mad”, the median absolute deviation method of outlier detection is used and the constant will be the lambda of the MAD formula. The default value is 3.
roc
sapply
save
scale
screeplot
shift.left
Description
Left-shift elements of each row of a table. When an element of the shift vector is missing, the whole row is marked missing in the result.
Arguments
shift |
a private integral vector containing the shift amount for each row |
data |
a matrix representing the table to be shifted |
left.padding |
indicates how many columns of missing values to add to the beginning of the matrix |
right.padding |
indicates how many columns of missing values to add to the end of the matrix |
show
sort
Description
Sorts a Sharemind database table. A new table is created and the original data remains unchanged.
Arguments
x | Sharemind table |
---|---|
by |
a string or a list of strings. The table will be sorted by those columns. Sorting is performed backwards so that the first column to sort by will be ordered in the resulting database. The next column will be ordered where first column elements are equal and so on. |
dir |
a string or a list of strings which specifies the sorting
direction for each column in the |
na |
a string or a list of strings which specifies the order of missing values. The possible values are "first" (missing values come before other values) and "last" (missing values come after other values). |
store.table
subset
summary.glm
Description
Compute the Akaike information criterion, standard errors and p-values
of coefficients of generalized linear models. See the documentation for
the glm
procedure. A list with three values is returned:
-
std.errors
is a vector consisting of the standard errors of the coefficients; -
p.values
is a vector consisting of the Wald test p-values of the coefficients; -
aic
is the Akaike information criterion.
t.test
Description
Performs Student’s or Welch’s t-test. If var.equal
is FALSE
, the
Welch-Satterthwaite equation is used to find the degrees of freedom.
Returns TRUE
if the null hypothesis is rejected and FALSE
otherwise.
Usage
t.test(x, y)
t.test(x, y, paired=FALSE, var.equal=TRUE, significance=0.05, alternative="two.sided", mu=0)
Arguments
x | private vector of the first sample |
---|---|
y |
private vector of the second sample |
paired |
a boolean indicating if a paired t-test should be performed |
var.equal |
a boolean indicating if the variances of the samples are expected to be equal |
significance |
significance level |
alternative |
a string indicating the type of the alternative hypothesis (either “two.sided”, “less” or “greater”) |
mu |
a number indicating the difference of means |
tryCatch
unique.id
Description
Computes a public unique ID from columns of input table. Does not modify the input table but returns a new table that contains the original columns and extra UID column. The rows of the resulting table are shuffled.
Unique ID column can be later used to sort and aggregate.
Arguments
x | Sharemind table |
---|---|
by |
a string or a list of strings denoting columns of the input table. Input columns of will be combined row-wise to form the UID. If two UID rows are equal this means that (with extremely high probability) the rows of the original data are also equal. If two UID rows are not equal this means that the original data rows were not. |
name |
(optional) name of the added UID column. The default name is “UID”. |
wilcoxon.test
Description
Perform Wilcoxon rank sum or Wilcoxon signed rank tests. If paired
is TRUE
then the Wilcoxon signed rank test is performed, otherwise
the Wilcoxon rank sum test is performed. Returns TRUE
if the null
hypothesis is rejected and FALSE
otherwise.
Usage
wilcoxon.test(x, y)
wilcoxon.test(x, y, paired=FALSE, alternative="two.sided", significance=0.05, correct=TRUE)
Arguments
x | private vector of the first sample |
---|---|
y |
private vector of the second sample |
paired |
a boolean indicating if the samples are paired or not |
alternative |
a string indicating the type of the alternative hypothesis (either “two.sided”, “less” or “greater”) |
significance |
significance level |
correct |
a boolean indicating if tied ranks should be replaced with their average. This will be slower but less conservative. |
xgboost
Description
Fits XGBoost.
The function returns a list with 14 elements. The elements are:
-
algo
- this function name; -
coefficients
- a private matrix indicating variable importance; -
eta
- a number indicating the used step size shrinkage; -
gamma
- a number indicating the used minimum loss reduction; -
lambda
- a number indicating the used L2 regularization penalty; -
max.depth
- a number indicating the used maximum depth of the tree; -
min.child.weight
- a number indicating the used minimum sum of instance weight needed in a child node; -
ml.type
- a string indicating the machine learning type, needed for prediction; -
nrounds
- a number indicating the fitted number of trees; -
nsplits
- a number indicating the used number of bins for the apporoximate greedy algorithm (nsplits corresponds to 1/sketch_eps); -
objective
- a string indicating the used objective for fitting; -
split.is.global
- a boolean indicating the apporoximate greedy algorithm is done whether globally or not(locally); -
subsampling
- a number indicating the used subsampling ratio of the fitting instances; -
variable.names
- variable names corresponding but without dependent variable;
Usage
xgboost(model)
params <- list(
eta=0.3,
gamma=1,
lambda=1,
objective="regression",
max.depth=3,
min.child.weight=1,
subsampling=0.7,
nsplit=20,
split.is.global=TRUE
)
xgboost(model, params=params, nrounds=3)
Arguments
model | linear model formula. Scaling is not necessary. |
---|---|
params |
a named list of parameters. Each element is based on its defenition written in https://xgboost.readthedocs.io/en/latest/parameter.html. If this list contains some of the above elements, the rest of the elements are defined with the default values. See the Value Restrictions below. |
nrounds |
a number indicating the number of the trees |