# dask_ml.xgboost.XGBClassifier¶

class dask_ml.xgboost.XGBClassifier(*, objective='binary:logistic', use_label_encoder=True, **kwargs)
Attributes: best_iteration best_ntree_limit best_score coef_ Coefficients property feature_importances_ Feature importances property intercept_ Intercept (bias) property n_features_in_

Methods

 apply(X, ntree_limit, iteration_range, …) Return the predicted leaf every tree for each sample. evals_result() Return the evaluation results. fit(X[, y, classes, eval_set, …]) Fit a gradient boosting classifier get_booster() Get the underlying xgboost Booster of this model. get_num_boosting_rounds() Gets the number of xgboost boosting rounds. get_params([deep]) Get parameters. get_xgb_params() Get xgboost specific parameters. load_model(fname) Load the model from a file. predict(X) Predict with X. predict_proba(data[, ntree_limit]) Predict the probability of each X example being of a given class. save_model(fname) Save the model to a file. score(X, y[, sample_weight]) Return the mean accuracy on the given test data and labels. set_params(**params) Set the parameters of this estimator.
__init__(*, objective='binary:logistic', use_label_encoder=True, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

apply(X, ntree_limit: int = 0, iteration_range: Optional[Tuple[int, int]] = None) → numpy.ndarray

Return the predicted leaf every tree for each sample.

Parameters: X : array_like, shape=[n_samples, n_features] Input features matrix. ntree_limit : int Limit number of trees in the prediction; defaults to 0 (use all trees). X_leaves : array_like, shape=[n_samples, n_trees] For each datapoint x in X and for each tree, return the index of the leaf x ends up in. Leaves are numbered within [0; 2**(self.max_depth+1)), possibly with gaps in the numbering.
coef_

Coefficients property

Note

Coefficients are defined only for linear learners

Coefficients are only defined when the linear model is chosen as base learner (booster=gblinear). It is not defined for other base learner types, such as tree learners (booster=gbtree).

Returns: coef_ : array of shape [n_features] or [n_classes, n_features]
evals_result()

Return the evaluation results.

If eval_set is passed to the fit function, you can call evals_result() to get evaluation results for all passed eval_sets. When eval_metric is also passed to the fit function, the evals_result will contain the eval_metrics passed to the fit function.

Returns: evals_result : dictionary
feature_importances_

Feature importances property

Note

Feature importance is defined only for tree boosters

Feature importance is only defined when the decision tree model is chosen as base learner (booster=gbtree). It is not defined for other base learner types, such as linear learners (booster=gblinear).

Returns: feature_importances_ : array of shape [n_features]
fit(X, y=None, classes=None, eval_set=None, sample_weight=None, sample_weight_eval_set=None, eval_metric=None, early_stopping_rounds=None)

Fit a gradient boosting classifier

Parameters: X : array-like [n_samples, n_features] Feature Matrix. May be a dask.array or dask.dataframe y : array-like Labels eval_set : list, optional A list of (X, y) tuple pairs to use as validation sets, for which metrics will be computed. Validation metrics will help us track the performance of the model. sample_weight : array_like, optional instance weights sample_weight_eval_set : list, optional A list of the form [L_1, L_2, …, L_n], where each L_i is a list of instance weights on the i-th validation set. eval_metric : str, list of str, or callable, optional If a str, should be a built-in evaluation metric to use. See doc/parameter.rst. # noqa: E501 If a list of str, should be the list of multiple built-in evaluation metrics to use. If callable, a custom evaluation metric. The call signature is func(y_predicted, y_true) where y_true will be a DMatrix object such that you may need to call the get_label method. It must return a str, value pair where the str is a name for the evaluation and value is the value of the evaluation function. The callable custom objective is always minimized. early_stopping_rounds : int Activates early stopping. Validation metric needs to improve at least once in every early_stopping_rounds round(s) to continue training. Requires at least one item in eval_set. The method returns the model from the last iteration (not the best one). If there’s more than one item in eval_set, the last entry will be used for early stopping. If there’s more than one metric in eval_metric, the last metric will be used for early stopping. If early stopping occurs, the model will have three additional fields: clf.best_score, clf.best_iteration and clf.best_ntree_limit. classes : sequence, optional The unique values in y. If no specified, this will be eagerly computed from y before training. self : XGBClassifier

Notes

This differs from the XGBoost version in three ways

1. The verbose fit kwargs are not supported.
2. The labels are not automatically label-encoded
3. The classes_ and n_classes_ attributes are not learned
get_booster()

Get the underlying xgboost Booster of this model.

This will raise an exception when fit was not called

Returns: booster : a xgboost booster of underlying model
get_num_boosting_rounds()

Gets the number of xgboost boosting rounds.

get_params(deep=True)

Get parameters.

get_xgb_params()

Get xgboost specific parameters.

intercept_

Intercept (bias) property

Note

Intercept is defined only for linear learners

Intercept (bias) is only defined when the linear model is chosen as base learner (booster=gblinear). It is not defined for other base learner types, such as tree learners (booster=gbtree).

Returns: intercept_ : array of shape (1,) or [n_classes]
load_model(fname)

Load the model from a file.

The model is loaded from an XGBoost internal format which is universal among the various XGBoost interfaces. Auxiliary attributes of the Python Booster object (such as feature names) will not be loaded.

Parameters: fname : string Input file name.
predict(X)

Predict with X.

Note

This function is only thread safe for gbtree and dart.

Parameters: X : array_like Data to predict with output_margin : bool Whether to output the raw untransformed margin value. ntree_limit : int Deprecated, use iteration_range instead. validate_features : bool When this is True, validate that the Booster’s and data’s feature_names are identical. Otherwise, it is assumed that the feature_names are the same. base_margin : array_like Margin added to prediction. iteration_range : Specifies which layer of trees are used in prediction. For example, if a random forest is trained with 100 rounds. Specifying iteration_range=(10, 20), then only the forests built during [10, 20) (half open set) rounds are used in this prediction. New in version 1.4.0. Returns ——- prediction : numpy array
predict_proba(data, ntree_limit=None)

Predict the probability of each X example being of a given class.

Note

This function is only thread safe for gbtree and dart.

Parameters: X : array_like Feature matrix. ntree_limit : int Deprecated, use iteration_range instead. validate_features : bool When this is True, validate that the Booster’s and data’s feature_names are identical. Otherwise, it is assumed that the feature_names are the same. base_margin : array_like Margin added to prediction. iteration_range : Specifies which layer of trees are used in prediction. For example, if a random forest is trained with 100 rounds. Specifying iteration_range=(10, 20), then only the forests built during [10, 20) (half open set) rounds are used in this prediction. prediction : numpy array a numpy array of shape array-like of shape (n_samples, n_classes) with the probability of each data example being of a given class.
save_model(fname: str)

Save the model to a file.

The model is saved in an XGBoost internal format which is universal among the various XGBoost interfaces. Auxiliary attributes of the Python Booster object (such as feature names) will not be saved.

Parameters: fname : string Output file name
score(X, y, sample_weight=None)

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters: X : array-like of shape (n_samples, n_features) Test samples. y : array-like of shape (n_samples,) or (n_samples, n_outputs) True labels for X. sample_weight : array-like of shape (n_samples,), default=None Sample weights. score : float Mean accuracy of self.predict(X) wrt. y.
set_params(**params)

Set the parameters of this estimator. Modification of the sklearn method to allow unknown kwargs. This allows using the full range of xgboost parameters that are not defined as member variables in sklearn grid search.

Returns: self