% Generated by roxygen2: do not edit by hand % Please edit documentation in R/xgb.cv.R \name{xgb.cv} \alias{xgb.cv} \title{Cross Validation} \usage{ xgb.cv( params = list(), data, nrounds, nfold, label = NULL, missing = NA, prediction = FALSE, showsd = TRUE, metrics = list(), obj = NULL, feval = NULL, stratified = TRUE, folds = NULL, train_folds = NULL, verbose = TRUE, print_every_n = 1L, early_stopping_rounds = NULL, maximize = NULL, callbacks = list(), ... ) } \arguments{ \item{params}{the list of parameters. The complete list of parameters is available in the \href{http://xgboost.readthedocs.io/en/latest/parameter.html}{online documentation}. Below is a shorter summary: \itemize{ \item \code{objective} objective function, common ones are \itemize{ \item \code{reg:squarederror} Regression with squared loss. \item \code{binary:logistic} logistic regression for classification. \item See \code{\link[=xgb.train]{xgb.train}()} for complete list of objectives. } \item \code{eta} step size of each boosting step \item \code{max_depth} maximum depth of the tree \item \code{nthread} number of thread used in training, if not set, all threads are used } See \code{\link{xgb.train}} for further details. See also demo/ for walkthrough example in R.} \item{data}{takes an \code{xgb.DMatrix}, \code{matrix}, or \code{dgCMatrix} as the input.} \item{nrounds}{the max number of iterations} \item{nfold}{the original dataset is randomly partitioned into \code{nfold} equal size subsamples.} \item{label}{vector of response values. Should be provided only when data is an R-matrix.} \item{missing}{is only used when input is a dense matrix. By default is set to NA, which means that NA values should be considered as 'missing' by the algorithm. Sometimes, 0 or other extreme value might be used to represent missing values.} \item{prediction}{A logical value indicating whether to return the test fold predictions from each CV model. This parameter engages the \code{\link{cb.cv.predict}} callback.} \item{showsd}{\code{boolean}, whether to show standard deviation of cross validation} \item{metrics, }{list of evaluation metrics to be used in cross validation, when it is not specified, the evaluation metric is chosen according to objective function. Possible options are: \itemize{ \item \code{error} binary classification error rate \item \code{rmse} Rooted mean square error \item \code{logloss} negative log-likelihood function \item \code{mae} Mean absolute error \item \code{mape} Mean absolute percentage error \item \code{auc} Area under curve \item \code{aucpr} Area under PR curve \item \code{merror} Exact matching error, used to evaluate multi-class classification }} \item{obj}{customized objective function. Returns gradient and second order gradient with given prediction and dtrain.} \item{feval}{customized evaluation function. Returns \code{list(metric='metric-name', value='metric-value')} with given prediction and dtrain.} \item{stratified}{a \code{boolean} indicating whether sampling of folds should be stratified by the values of outcome labels.} \item{folds}{\code{list} provides a possibility to use a list of pre-defined CV folds (each element must be a vector of test fold's indices). When folds are supplied, the \code{nfold} and \code{stratified} parameters are ignored.} \item{train_folds}{\code{list} list specifying which indicies to use for training. If \code{NULL} (the default) all indices not specified in \code{folds} will be used for training.} \item{verbose}{\code{boolean}, print the statistics during the process} \item{print_every_n}{Print each n-th iteration evaluation messages when \code{verbose>0}. Default is 1 which means all messages are printed. This parameter is passed to the \code{\link{cb.print.evaluation}} callback.} \item{early_stopping_rounds}{If \code{NULL}, the early stopping function is not triggered. If set to an integer \code{k}, training with a validation set will stop if the performance doesn't improve for \code{k} rounds. Setting this parameter engages the \code{\link{cb.early.stop}} callback.} \item{maximize}{If \code{feval} and \code{early_stopping_rounds} are set, then this parameter must be set as well. When it is \code{TRUE}, it means the larger the evaluation score the better. This parameter is passed to the \code{\link{cb.early.stop}} callback.} \item{callbacks}{a list of callback functions to perform various task during boosting. See \code{\link{callbacks}}. Some of the callbacks are automatically created depending on the parameters' values. User can provide either existing or their own callback methods in order to customize the training process.} \item{...}{other parameters to pass to \code{params}.} } \value{ An object of class \code{xgb.cv.synchronous} with the following elements: \itemize{ \item \code{call} a function call. \item \code{params} parameters that were passed to the xgboost library. Note that it does not capture parameters changed by the \code{\link{cb.reset.parameters}} callback. \item \code{callbacks} callback functions that were either automatically assigned or explicitly passed. \item \code{evaluation_log} evaluation history stored as a \code{data.table} with the first column corresponding to iteration number and the rest corresponding to the CV-based evaluation means and standard deviations for the training and test CV-sets. It is created by the \code{\link{cb.evaluation.log}} callback. \item \code{niter} number of boosting iterations. \item \code{nfeatures} number of features in training data. \item \code{folds} the list of CV folds' indices - either those passed through the \code{folds} parameter or randomly generated. \item \code{best_iteration} iteration number with the best evaluation metric value (only available with early stopping). \item \code{best_ntreelimit} and the \code{ntreelimit} Deprecated attributes, use \code{best_iteration} instead. \item \code{pred} CV prediction values available when \code{prediction} is set. It is either vector or matrix (see \code{\link{cb.cv.predict}}). \item \code{models} a list of the CV folds' models. It is only available with the explicit setting of the \code{cb.cv.predict(save_models = TRUE)} callback. } } \description{ The cross validation function of xgboost } \details{ The original sample is randomly partitioned into \code{nfold} equal size subsamples. Of the \code{nfold} subsamples, a single subsample is retained as the validation data for testing the model, and the remaining \code{nfold - 1} subsamples are used as training data. The cross-validation process is then repeated \code{nrounds} times, with each of the \code{nfold} subsamples used exactly once as the validation data. All observations are used for both training and validation. Adapted from \url{https://en.wikipedia.org/wiki/Cross-validation_\%28statistics\%29} } \examples{ data(agaricus.train, package='xgboost') dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label)) cv <- xgb.cv(data = dtrain, nrounds = 3, nthread = 2, nfold = 5, metrics = list("rmse","auc"), max_depth = 3, eta = 1, objective = "binary:logistic") print(cv) print(cv, verbose=TRUE) }