############################################ Using XGBoost External Memory Version (beta) ############################################ There is no big difference between using external memory version and in-memory version. The only difference is the filename format. The external memory version takes in the following filename format: .. code-block:: none filename#cacheprefix The ``filename`` is the normal path to libsvm file you want to load in, and ``cacheprefix`` is a path to a cache file that XGBoost will use for external memory cache. .. note:: External memory is not available with GPU algorithms External memory is not available when ``tree_method`` is set to ``gpu_exact`` or ``gpu_hist``. The following code was extracted from `demo/guide-python/external_memory.py `_: .. code-block:: python dtrain = xgb.DMatrix('../data/agaricus.txt.train#dtrain.cache') You can find that there is additional ``#dtrain.cache`` following the libsvm file, this is the name of cache file. For CLI version, simply add the cache suffix, e.g. ``"../data/agaricus.txt.train#dtrain.cache"``. **************** Performance Note **************** * the parameter ``nthread`` should be set to number of **physical** cores - Most modern CPUs use hyperthreading, which means a 4 core CPU may carry 8 threads - Set ``nthread`` to be 4 for maximum performance in such case ******************* Distributed Version ******************* The external memory mode naturally works on distributed version, you can simply set path like .. code-block:: none data = "hdfs://path-to-data/#dtrain.cache" XGBoost will cache the data to the local position. When you run on YARN, the current folder is temporal so that you can directly use ``dtrain.cache`` to cache to current folder. ********** Usage Note ********** * This is a experimental version * Currently only importing from libsvm format is supported - Contribution of ingestion from other common external memory data source is welcomed