dask_ml.xgboost.train

dask_ml.xgboost.train(client, params, data, labels, dmatrix_kwargs={}, **kwargs)

Train an XGBoost model on a Dask Cluster

This starts XGBoost on all Dask workers, moves input data to those workers, and then calls xgboost.train on the inputs.

Parameters:
client: dask.distributed.Client
params: dict

Parameters to give to XGBoost (see xgb.Booster.train)

data: dask array or dask.dataframe
labels: dask.array or dask.dataframe
dmatrix_kwargs: Keywords to give to Xgboost DMatrix
**kwargs: Keywords to give to XGBoost train

See also

predict

Examples

>>> client = Client('scheduler-address:8786')  # doctest: +SKIP
>>> data = dd.read_csv('s3://...')  # doctest: +SKIP
>>> labels = data['outcome']  # doctest: +SKIP
>>> del data['outcome']  # doctest: +SKIP
>>> train(client, params, data, labels, **normal_kwargs)  # doctest: +SKIP
<xgboost.core.Booster object at ...>