dask_ml.xgboost.train(client, params, data, labels, dmatrix_kwargs={}, evals_result=None, sample_weight=None, **kwargs)

Train an XGBoost model on a Dask Cluster

This starts XGBoost on all Dask workers, moves input data to those workers, and then calls xgboost.train on the inputs.

client: dask.distributed.Client
params: dict

Parameters to give to XGBoost (see xgb.Booster.train)

data: dask array or dask.dataframe
labels: dask.array or dask.dataframe
dmatrix_kwargs: Keywords to give to Xgboost DMatrix
evals_result: dict, optional

Stores the evaluation result history of all the items in the eval_set by mutating evals_result in place.

sample_weightarray_like, optional

instance weights

**kwargs: Keywords to give to XGBoost train

See also



>>> client = Client('scheduler-address:8786')  
>>> data = dd.read_csv('s3://...')  
>>> labels = data['outcome']  
>>> del data['outcome']  
>>> train(client, params, data, labels, **normal_kwargs)  
<xgboost.core.Booster object at ...>