dask_ml.xgboost
.XGBRegressor¶

class
dask_ml.xgboost.
XGBRegressor
(max_depth=3, learning_rate=0.1, n_estimators=100, verbosity=1, silent=None, objective='reg:linear', booster='gbtree', n_jobs=1, nthread=None, gamma=0, min_child_weight=1, max_delta_step=0, subsample=1, colsample_bytree=1, colsample_bylevel=1, colsample_bynode=1, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, base_score=0.5, random_state=0, seed=None, missing=None, importance_type='gain', **kwargs)¶ Attributes: coef_
Coefficients property
feature_importances_
Feature importances property
intercept_
Intercept (bias) property
Methods
apply
(self, X[, ntree_limit])Return the predicted leaf every tree for each sample. evals_result
(self)Return the evaluation results. fit
(self, X[, y, eval_set, …])Fit the gradient boosting model get_booster
(self)Get the underlying xgboost Booster of this model. get_num_boosting_rounds
(self)Gets the number of xgboost boosting rounds. get_params
(self[, deep])Get parameters. get_xgb_params
(self)Get xgboost type parameters. load_model
(self, fname)Load the model from a file. predict
(self, X)Predict with data. save_model
(self, fname)Save the model to a file. score
(self, X, y[, sample_weight])Return the coefficient of determination R^2 of the prediction. set_params
(self, \*\*params)Set the parameters of this estimator. 
__init__
(self, max_depth=3, learning_rate=0.1, n_estimators=100, verbosity=1, silent=None, objective='reg:linear', booster='gbtree', n_jobs=1, nthread=None, gamma=0, min_child_weight=1, max_delta_step=0, subsample=1, colsample_bytree=1, colsample_bylevel=1, colsample_bynode=1, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, base_score=0.5, random_state=0, seed=None, missing=None, importance_type='gain', **kwargs)¶ Initialize self. See help(type(self)) for accurate signature.

apply
(self, X, ntree_limit=0)¶ Return the predicted leaf every tree for each sample.
Parameters:  X : array_like, shape=[n_samples, n_features]
Input features matrix.
 ntree_limit : int
Limit number of trees in the prediction; defaults to 0 (use all trees).
Returns:  X_leaves : array_like, shape=[n_samples, n_trees]
For each datapoint x in X and for each tree, return the index of the leaf x ends up in. Leaves are numbered within
[0; 2**(self.max_depth+1))
, possibly with gaps in the numbering.

coef_
¶ Coefficients property
Note
Coefficients are defined only for linear learners
Coefficients are only defined when the linear model is chosen as base learner (booster=gblinear). It is not defined for other base learner types, such as tree learners (booster=gbtree).
Returns:  coef_ : array of shape
[n_features]
or[n_classes, n_features]
 coef_ : array of shape

evals_result
(self)¶ Return the evaluation results.
If eval_set is passed to the fit function, you can call
evals_result()
to get evaluation results for all passed eval_sets. When eval_metric is also passed to the fit function, the evals_result will contain the eval_metrics passed to the fit function.Returns:  evals_result : dictionary

feature_importances_
¶ Feature importances property
Note
Feature importance is defined only for tree boosters
Feature importance is only defined when the decision tree model is chosen as base learner (booster=gbtree). It is not defined for other base learner types, such as linear learners (booster=gblinear).
Returns:  feature_importances_ : array of shape
[n_features]
 feature_importances_ : array of shape

fit
(self, X, y=None, eval_set=None, sample_weight_eval_set=None, eval_metric=None, early_stopping_rounds=None)¶ Fit the gradient boosting model
Parameters:  X : arraylike [n_samples, n_features]
 y : arraylike
Returns:  self : the fitted Regressor
Notes
This differs from the XGBoost version not supporting the
eval_set
,eval_metric
,early_stopping_rounds
andverbose
fit kwargs. eval_set : list, optionalA list of (X, y) tuple pairs to use as validation sets, for which metrics will be computed. Validation metrics will help us track the performance of the model. sample_weight_eval_set : list, optional
 A list of the form [L_1, L_2, …, L_n], where each L_i is a list of instance weights on the ith validation set.
 eval_metric : str, list of str, or callable, optional
 If a str, should be a builtin evaluation metric to use. See
doc/parameter.rst. # noqa: E501
If a list of str, should be the list of multiple builtin
evaluation metrics to use.
If callable, a custom evaluation metric. The call
signature is
func(y_predicted, y_true)
wherey_true
will be a DMatrix object such that you may need to call theget_label
method. It must return a str, value pair where the str is a name for the evaluation and value is the value of the evaluation function. The callable custom objective is always minimized.  early_stopping_rounds : int
 Activates early stopping. Validation metric needs to improve at
least once in every early_stopping_rounds round(s) to continue
training.
Requires at least one item in eval_set.
The method returns the model from the last iteration (not the best
one).
If there’s more than one item in eval_set, the last entry will
be used for early stopping.
If there’s more than one metric in eval_metric, the last
metric will be used for early stopping.
If early stopping occurs, the model will have three additional
fields:
clf.best_score
,clf.best_iteration
andclf.best_ntree_limit
.

get_booster
(self)¶ Get the underlying xgboost Booster of this model.
This will raise an exception when fit was not called
Returns:  booster : a xgboost booster of underlying model

get_num_boosting_rounds
(self)¶ Gets the number of xgboost boosting rounds.

get_params
(self, deep=False)¶ Get parameters.

get_xgb_params
(self)¶ Get xgboost type parameters.

intercept_
¶ Intercept (bias) property
Note
Intercept is defined only for linear learners
Intercept (bias) is only defined when the linear model is chosen as base learner (booster=gblinear). It is not defined for other base learner types, such as tree learners (booster=gbtree).
Returns:  intercept_ : array of shape
(1,)
or[n_classes]
 intercept_ : array of shape

load_model
(self, fname)¶ Load the model from a file.
The model is loaded from an XGBoost internal binary format which is universal among the various XGBoost interfaces. Auxiliary attributes of the Python Booster object (such as feature names) will not be loaded. Label encodings (text labels to numeric labels) will be also lost. If you are using only the Python interface, we recommend pickling the model object for best results.
Parameters:  fname : string or a memory buffer
Input file name or memory buffer(see also save_raw)

predict
(self, X)¶ Predict with data.
Note
This function is not thread safe.
For each booster object, predict can only be called from one thread. If you want to run prediction using multiple thread, call
xgb.copy()
to make copies of model object and then callpredict()
.Note
Using
predict()
with DART boosterIf the booster object is DART type,
predict()
will perform dropouts, i.e. only some of the trees will be evaluated. This will produce incorrect results ifdata
is not the training data. To obtain correct results on test sets, setntree_limit
to a nonzero value, e.g.preds = bst.predict(dtest, ntree_limit=num_round)
Parameters:  data : DMatrix
The dmatrix storing the input.
 output_margin : bool
Whether to output the raw untransformed margin value.
 ntree_limit : int
Limit number of trees in the prediction; defaults to best_ntree_limit if defined (i.e. it has been trained with early stopping), otherwise 0 (use all trees).
 validate_features : bool
When this is True, validate that the Booster’s and data’s feature_names are identical. Otherwise, it is assumed that the feature_names are the same.
 Returns
 ——
 prediction : numpy array

save_model
(self, fname)¶ Save the model to a file.
The model is saved in an XGBoost internal binary format which is universal among the various XGBoost interfaces. Auxiliary attributes of the Python Booster object (such as feature names) will not be loaded. Label encodings (text labels to numeric labels) will be also lost. If you are using only the Python interface, we recommend pickling the model object for best results.
Parameters:  fname : string
Output file name

score
(self, X, y, sample_weight=None)¶ Return the coefficient of determination R^2 of the prediction.
The coefficient R^2 is defined as (1  u/v), where u is the residual sum of squares ((y_true  y_pred) ** 2).sum() and v is the total sum of squares ((y_true  y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.
Parameters:  X : arraylike of shape (n_samples, n_features)
Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead, shape = (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator.
 y : arraylike of shape (n_samples,) or (n_samples, n_outputs)
True values for X.
 sample_weight : arraylike of shape (n_samples,), default=None
Sample weights.
Returns:  score : float
R^2 of self.predict(X) wrt. y.
Notes
The R2 score used when calling
score
on a regressor will usemultioutput='uniform_average'
from version 0.23 to keep consistent withr2_score()
. This will influence thescore
method of all the multioutput regressors (except forMultiOutputRegressor
). To specify the default value manually and avoid the warning, please either callr2_score()
directly or make a custom scorer withmake_scorer()
(the builtin scorer'r2'
usesmultioutput='uniform_average'
).

set_params
(self, **params)¶ Set the parameters of this estimator. Modification of the sklearn method to allow unknown kwargs. This allows using the full range of xgboost parameters that are not defined as member variables in sklearn grid search. Returns —— self