dask_ml.xgboost.XGBRegressor

class dask_ml.xgboost.XGBRegressor(max_depth=3, learning_rate=0.1, n_estimators=100, silent=True, objective='reg:linear', booster='gbtree', n_jobs=1, nthread=None, gamma=0, min_child_weight=1, max_delta_step=0, subsample=1, colsample_bytree=1, colsample_bylevel=1, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, base_score=0.5, random_state=0, seed=None, missing=None, **kwargs)
Attributes:
coef_

Coefficients property

feature_importances_

Feature importances property

intercept_

Intercept (bias) property

Methods

apply(X[, ntree_limit]) Return the predicted leaf every tree for each sample.
evals_result() Return the evaluation results.
fit(X[, y]) Fit the gradient boosting model
get_booster() Get the underlying xgboost Booster of this model.
get_params([deep]) Get parameters.
get_xgb_params() Get xgboost type parameters.
load_model(fname) Load the model from a file.
predict(X) Predict with data.
save_model(fname) Save the model to a file.
score(X, y[, sample_weight]) Returns the coefficient of determination R^2 of the prediction.
set_params(**params) Set the parameters of this estimator.
__init__(max_depth=3, learning_rate=0.1, n_estimators=100, silent=True, objective='reg:linear', booster='gbtree', n_jobs=1, nthread=None, gamma=0, min_child_weight=1, max_delta_step=0, subsample=1, colsample_bytree=1, colsample_bylevel=1, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, base_score=0.5, random_state=0, seed=None, missing=None, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

apply(X, ntree_limit=0)

Return the predicted leaf every tree for each sample.

Parameters:
X : array_like, shape=[n_samples, n_features]

Input features matrix.

ntree_limit : int

Limit number of trees in the prediction; defaults to 0 (use all trees).

Returns:
X_leaves : array_like, shape=[n_samples, n_trees]

For each datapoint x in X and for each tree, return the index of the leaf x ends up in. Leaves are numbered within [0; 2**(self.max_depth+1)), possibly with gaps in the numbering.

coef_

Coefficients property

Note

Coefficients are defined only for linear learners

Coefficients are only defined when the linear model is chosen as base learner (booster=gblinear). It is not defined for other base learner types, such as tree learners (booster=gbtree).

Returns:
coef_ : array of shape [n_features]
evals_result()

Return the evaluation results.

If eval_set is passed to the fit function, you can call evals_result() to get evaluation results for all passed eval_sets. When eval_metric is also passed to the fit function, the evals_result will contain the eval_metrics passed to the fit function.

Returns:
evals_result : dictionary
feature_importances_

Feature importances property

Note

Feature importance is defined only for tree boosters

Feature importance is only defined when the decision tree model is chosen as base learner (booster=gbtree). It is not defined for other base learner types, such as linear learners (booster=gblinear).

Returns:
feature_importances_ : array of shape [n_features]
fit(X, y=None)

Fit the gradient boosting model

Parameters:
X : array-like [n_samples, n_features]
y : array-like
Returns:
self : the fitted Regressor

Notes

This differs from the XGBoost version not supporting the eval_set, eval_metric, early_stopping_rounds and verbose fit kwargs.

get_booster()

Get the underlying xgboost Booster of this model.

This will raise an exception when fit was not called

Returns:
booster : a xgboost booster of underlying model
get_params(deep=False)

Get parameters.

get_xgb_params()

Get xgboost type parameters.

intercept_

Intercept (bias) property

Note

Intercept is defined only for linear learners

Intercept (bias) is only defined when the linear model is chosen as base learner (booster=gblinear). It is not defined for other base learner types, such as tree learners (booster=gbtree).

Returns:
intercept_ : array of shape [n_features]
load_model(fname)

Load the model from a file.

The model is loaded from an XGBoost internal binary format which is universal among the various XGBoost interfaces. Auxiliary attributes of the Python Booster object (such as feature names) will not be loaded. Label encodings (text labels to numeric labels) will be also lost. If you are using only the Python interface, we recommend pickling the model object for best results.

Parameters:
fname : string or a memory buffer

Input file name or memory buffer(see also save_raw)

predict(X)

Predict with data.

Note

This function is not thread safe.

For each booster object, predict can only be called from one thread. If you want to run prediction using multiple thread, call xgb.copy() to make copies of model object and then call predict().

Note

Using predict() with DART booster

If the booster object is DART type, predict() will perform dropouts, i.e. only some of the trees will be evaluated. This will produce incorrect results if data is not the training data. To obtain correct results on test sets, set ntree_limit to a nonzero value, e.g.

preds = bst.predict(dtest, ntree_limit=num_round)
Parameters:
data : DMatrix

The dmatrix storing the input.

output_margin : bool

Whether to output the raw untransformed margin value.

ntree_limit : int

Limit number of trees in the prediction; defaults to best_ntree_limit if defined (i.e. it has been trained with early stopping), otherwise 0 (use all trees).

validate_features : bool

When this is True, validate that the Booster’s and data’s feature_names are identical. Otherwise, it is assumed that the feature_names are the same.

Returns
——-
prediction : numpy array
save_model(fname)

Save the model to a file.

The model is saved in an XGBoost internal binary format which is universal among the various XGBoost interfaces. Auxiliary attributes of the Python Booster object (such as feature names) will not be loaded. Label encodings (text labels to numeric labels) will be also lost. If you are using only the Python interface, we recommend pickling the model object for best results.

Parameters:
fname : string

Output file name

score(X, y, sample_weight=None)

Returns the coefficient of determination R^2 of the prediction.

The coefficient R^2 is defined as (1 - u/v), where u is the residual sum of squares ((y_true - y_pred) ** 2).sum() and v is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.

Parameters:
X : array-like, shape = (n_samples, n_features)

Test samples. For some estimators this may be a precomputed kernel matrix instead, shape = (n_samples, n_samples_fitted], where n_samples_fitted is the number of samples used in the fitting for the estimator.

y : array-like, shape = (n_samples) or (n_samples, n_outputs)

True values for X.

sample_weight : array-like, shape = [n_samples], optional

Sample weights.

Returns:
score : float

R^2 of self.predict(X) wrt. y.

set_params(**params)

Set the parameters of this estimator. Modification of the sklearn method to allow unknown kwargs. This allows using the full range of xgboost parameters that are not defined as member variables in sklearn grid search. Returns ——- self