% Generated by roxygen2: do not edit by hand % Please edit documentation in R/lgb.cv.R \name{lgb.cv} \alias{lgb.cv} \title{Main CV logic for LightGBM} \usage{ lgb.cv( params = list(), data, nrounds = 100L, nfold = 3L, label = NULL, weight = NULL, obj = NULL, eval = NULL, verbose = 1L, record = TRUE, eval_freq = 1L, showsd = TRUE, stratified = TRUE, folds = NULL, init_model = NULL, colnames = NULL, categorical_feature = NULL, early_stopping_rounds = NULL, callbacks = list(), reset_data = FALSE, serializable = TRUE, eval_train_metric = FALSE ) } \arguments{ \item{params}{a list of parameters. See \href{https://lightgbm.readthedocs.io/en/latest/Parameters.html}{ the "Parameters" section of the documentation} for a list of parameters and valid values.} \item{data}{a \code{lgb.Dataset} object, used for training. Some functions, such as \code{\link{lgb.cv}}, may allow you to pass other types of data like \code{matrix} and then separately supply \code{label} as a keyword argument.} \item{nrounds}{number of training rounds} \item{nfold}{the original dataset is randomly partitioned into \code{nfold} equal size subsamples.} \item{label}{Deprecated. See "Deprecated Arguments" section below.} \item{weight}{Deprecated. See "Deprecated Arguments" section below.} \item{obj}{objective function, can be character or custom objective function. Examples include \code{regression}, \code{regression_l1}, \code{huber}, \code{binary}, \code{lambdarank}, \code{multiclass}, \code{multiclass}} \item{eval}{evaluation function(s). This can be a character vector, function, or list with a mixture of strings and functions. \itemize{ \item{\bold{a. character vector}: If you provide a character vector to this argument, it should contain strings with valid evaluation metrics. See \href{https://lightgbm.readthedocs.io/en/latest/Parameters.html#metric}{ The "metric" section of the documentation} for a list of valid metrics. } \item{\bold{b. function}: You can provide a custom evaluation function. This should accept the keyword arguments \code{preds} and \code{dtrain} and should return a named list with three elements: \itemize{ \item{\code{name}: A string with the name of the metric, used for printing and storing results. } \item{\code{value}: A single number indicating the value of the metric for the given predictions and true values } \item{ \code{higher_better}: A boolean indicating whether higher values indicate a better fit. For example, this would be \code{FALSE} for metrics like MAE or RMSE. } } } \item{\bold{c. list}: If a list is given, it should only contain character vectors and functions. These should follow the requirements from the descriptions above. } }} \item{verbose}{verbosity for output, if <= 0 and \code{valids} has been provided, also will disable the printing of evaluation during training} \item{record}{Boolean, TRUE will record iteration message to \code{booster$record_evals}} \item{eval_freq}{evaluation output frequency, only effective when verbose > 0 and \code{valids} has been provided} \item{showsd}{\code{boolean}, whether to show standard deviation of cross validation. This parameter defaults to \code{TRUE}. Setting it to \code{FALSE} can lead to a slight speedup by avoiding unnecessary computation.} \item{stratified}{a \code{boolean} indicating whether sampling of folds should be stratified by the values of outcome labels.} \item{folds}{\code{list} provides a possibility to use a list of pre-defined CV folds (each element must be a vector of test fold's indices). When folds are supplied, the \code{nfold} and \code{stratified} parameters are ignored.} \item{init_model}{path of model file or \code{lgb.Booster} object, will continue training from this model} \item{colnames}{Deprecated. See "Deprecated Arguments" section below.} \item{categorical_feature}{Deprecated. See "Deprecated Arguments" section below.} \item{early_stopping_rounds}{int. Activates early stopping. When this parameter is non-null, training will stop if the evaluation of any metric on any validation set fails to improve for \code{early_stopping_rounds} consecutive boosting rounds. If training stops early, the returned model will have attribute \code{best_iter} set to the iteration number of the best iteration.} \item{callbacks}{List of callback functions that are applied at each iteration.} \item{reset_data}{Boolean, setting it to TRUE (not the default value) will transform the booster model into a predictor model which frees up memory and the original datasets} \item{serializable}{whether to make the resulting objects serializable through functions such as \code{save} or \code{saveRDS} (see section "Model serialization").} \item{eval_train_metric}{\code{boolean}, whether to add the cross validation results on the training data. This parameter defaults to \code{FALSE}. Setting it to \code{TRUE} will increase run time.} } \value{ a trained model \code{lgb.CVBooster}. } \description{ Cross validation logic used by LightGBM } \section{Deprecated Arguments}{ A future release of \code{lightgbm} will require passing an \code{lgb.Dataset} to argument \code{'data'}. It will also remove support for passing arguments \code{'categorical_feature'}, \code{'colnames'}, \code{'label'}, and \code{'weight'}. } \section{Early Stopping}{ "early stopping" refers to stopping the training process if the model's performance on a given validation set does not improve for several consecutive iterations. If multiple arguments are given to \code{eval}, their order will be preserved. If you enable early stopping by setting \code{early_stopping_rounds} in \code{params}, by default all metrics will be considered for early stopping. If you want to only consider the first metric for early stopping, pass \code{first_metric_only = TRUE} in \code{params}. Note that if you also specify \code{metric} in \code{params}, that metric will be considered the "first" one. If you omit \code{metric}, a default metric will be used based on your choice for the parameter \code{obj} (keyword argument) or \code{objective} (passed into \code{params}). } \examples{ \donttest{ \dontshow{setLGBMthreads(2L)} \dontshow{data.table::setDTthreads(1L)} data(agaricus.train, package = "lightgbm") train <- agaricus.train dtrain <- lgb.Dataset(train$data, label = train$label) params <- list( objective = "regression" , metric = "l2" , min_data = 1L , learning_rate = 1.0 , num_threads = 2L ) model <- lgb.cv( params = params , data = dtrain , nrounds = 5L , nfold = 3L ) } }