dask_ml.xgboost
.XGBClassifier¶

class
dask_ml.xgboost.
XGBClassifier
(max_depth=3, learning_rate=0.1, n_estimators=100, silent=True, objective='binary:logistic', booster='gbtree', n_jobs=1, nthread=None, gamma=0, min_child_weight=1, max_delta_step=0, subsample=1, colsample_bytree=1, colsample_bylevel=1, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, base_score=0.5, random_state=0, seed=None, missing=None, **kwargs)¶  Attributes
coef_
Coefficients property
feature_importances_
Feature importances property
intercept_
Intercept (bias) property
Methods
apply
(self, X[, ntree_limit])Return the predicted leaf every tree for each sample.
evals_result
(self)Return the evaluation results.
fit
(self, X[, y, classes])Fit a gradient boosting classifier
get_booster
(self)Get the underlying xgboost Booster of this model.
get_params
(self[, deep])Get parameters.
get_xgb_params
(self)Get xgboost type parameters.
load_model
(self, fname)Load the model from a file.
predict
(self, X)Predict with data.
predict_proba
(self, data[, ntree_limit])Predict the probability of each data example being of a given class.
save_model
(self, fname)Save the model to a file.
score
(self, X, y[, sample_weight])Returns the mean accuracy on the given test data and labels.
set_params
(self, \*\*params)Set the parameters of this estimator.

__init__
(self, max_depth=3, learning_rate=0.1, n_estimators=100, silent=True, objective='binary:logistic', booster='gbtree', n_jobs=1, nthread=None, gamma=0, min_child_weight=1, max_delta_step=0, subsample=1, colsample_bytree=1, colsample_bylevel=1, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, base_score=0.5, random_state=0, seed=None, missing=None, **kwargs)¶ Initialize self. See help(type(self)) for accurate signature.

apply
(self, X, ntree_limit=0)¶ Return the predicted leaf every tree for each sample.
 Parameters
 Xarray_like, shape=[n_samples, n_features]
Input features matrix.
 ntree_limitint
Limit number of trees in the prediction; defaults to 0 (use all trees).
 Returns
 X_leavesarray_like, shape=[n_samples, n_trees]
For each datapoint x in X and for each tree, return the index of the leaf x ends up in. Leaves are numbered within
[0; 2**(self.max_depth+1))
, possibly with gaps in the numbering.

coef_
¶ Coefficients property
Note
Coefficients are defined only for linear learners
Coefficients are only defined when the linear model is chosen as base learner (booster=gblinear). It is not defined for other base learner types, such as tree learners (booster=gbtree).
 Returns
 coef_array of shape
[n_features]
or[n_classes, n_features]
 coef_array of shape

evals_result
(self)¶ Return the evaluation results.
If eval_set is passed to the fit function, you can call
evals_result()
to get evaluation results for all passed eval_sets. When eval_metric is also passed to the fit function, the evals_result will contain the eval_metrics passed to the fit function. Returns
 evals_resultdictionary

feature_importances_
¶ Feature importances property
Note
Feature importance is defined only for tree boosters
Feature importance is only defined when the decision tree model is chosen as base learner (booster=gbtree). It is not defined for other base learner types, such as linear learners (booster=gblinear).
 Returns
 feature_importances_array of shape
[n_features]
 feature_importances_array of shape

fit
(self, X, y=None, classes=None)¶ Fit a gradient boosting classifier
 Parameters
 Xarraylike [n_samples, n_features]
Feature Matrix. May be a dask.array or dask.dataframe
 yarraylike
Labels
 classessequence, optional
The unique values in y. If no specified, this will be eagerly computed from y before training.
 Returns
 selfXGBClassifier
Notes
This differs from the XGBoost version in three ways
The
sample_weight
,eval_set
,eval_metric
,
early_stopping_rounds
andverbose
fit kwargs are not supported.The labels are not automatically labelencoded
The
classes_
andn_classes_
attributes are not learned

get_booster
(self)¶ Get the underlying xgboost Booster of this model.
This will raise an exception when fit was not called
 Returns
 boostera xgboost booster of underlying model

get_params
(self, deep=False)¶ Get parameters.

get_xgb_params
(self)¶ Get xgboost type parameters.

intercept_
¶ Intercept (bias) property
Note
Intercept is defined only for linear learners
Intercept (bias) is only defined when the linear model is chosen as base learner (booster=gblinear). It is not defined for other base learner types, such as tree learners (booster=gbtree).
 Returns
 intercept_array of shape
(1,)
or[n_classes]
 intercept_array of shape

load_model
(self, fname)¶ Load the model from a file.
The model is loaded from an XGBoost internal binary format which is universal among the various XGBoost interfaces. Auxiliary attributes of the Python Booster object (such as feature names) will not be loaded. Label encodings (text labels to numeric labels) will be also lost. If you are using only the Python interface, we recommend pickling the model object for best results.
 Parameters
 fnamestring or a memory buffer
Input file name or memory buffer(see also save_raw)

predict
(self, X)¶ Predict with data.
Note
This function is not thread safe.
For each booster object, predict can only be called from one thread. If you want to run prediction using multiple thread, call
xgb.copy()
to make copies of model object and then callpredict()
.Note
Using
predict()
with DART boosterIf the booster object is DART type,
predict()
will perform dropouts, i.e. only some of the trees will be evaluated. This will produce incorrect results ifdata
is not the training data. To obtain correct results on test sets, setntree_limit
to a nonzero value, e.g.preds = bst.predict(dtest, ntree_limit=num_round)
 Parameters
 dataDMatrix
The dmatrix storing the input.
 output_marginbool
Whether to output the raw untransformed margin value.
 ntree_limitint
Limit number of trees in the prediction; defaults to best_ntree_limit if defined (i.e. it has been trained with early stopping), otherwise 0 (use all trees).
 validate_featuresbool
When this is True, validate that the Booster’s and data’s feature_names are identical. Otherwise, it is assumed that the feature_names are the same.
 Returns
 ——
 predictionnumpy array

predict_proba
(self, data, ntree_limit=None)¶ Predict the probability of each data example being of a given class.
Note
This function is not thread safe
For each booster object, predict can only be called from one thread. If you want to run prediction using multiple thread, call
xgb.copy()
to make copies of model object and then call predict Parameters
 dataDMatrix
The dmatrix storing the input.
 ntree_limitint
Limit number of trees in the prediction; defaults to best_ntree_limit if defined (i.e. it has been trained with early stopping), otherwise 0 (use all trees).
 validate_featuresbool
When this is True, validate that the Booster’s and data’s feature_names are identical. Otherwise, it is assumed that the feature_names are the same.
 Returns
 predictionnumpy array
a numpy array with the probability of each data example being of a given class.

save_model
(self, fname)¶ Save the model to a file.
The model is saved in an XGBoost internal binary format which is universal among the various XGBoost interfaces. Auxiliary attributes of the Python Booster object (such as feature names) will not be loaded. Label encodings (text labels to numeric labels) will be also lost. If you are using only the Python interface, we recommend pickling the model object for best results.
 Parameters
 fnamestring
Output file name

score
(self, X, y, sample_weight=None)¶ Returns the mean accuracy on the given test data and labels.
In multilabel classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
 Parameters
 Xarraylike, shape = (n_samples, n_features)
Test samples.
 yarraylike, shape = (n_samples) or (n_samples, n_outputs)
True labels for X.
 sample_weightarraylike, shape = [n_samples], optional
Sample weights.
 Returns
 scorefloat
Mean accuracy of self.predict(X) wrt. y.

set_params
(self, **params)¶ Set the parameters of this estimator. Modification of the sklearn method to allow unknown kwargs. This allows using the full range of xgboost parameters that are not defined as member variables in sklearn grid search. Returns —— self