dask_ml.linear_model.LinearRegression

class dask_ml.linear_model.LinearRegression(penalty='l2', dual=False, tol=0.0001, C=1.0, fit_intercept=True, intercept_scaling=1.0, class_weight=None, random_state=None, solver='admm', max_iter=100, multi_class='ovr', verbose=0, warm_start=False, n_jobs=1, solver_kwargs=None)

Esimator for linear regression.

Parameters:
penalty : str or Regularizer, default ‘l2’

Regularizer to use. Only relevant for the ‘admm’, ‘lbfgs’ and ‘proximal_grad’ solvers.

For string values, only ‘l1’ or ‘l2’ are valid.

dual : bool

Ignored

tol : float, default 1e-4

The tolerance for convergence.

C : float

Regularization strength. Note that dask-glm solvers use the parameterization \(\lambda = 1 / C\)

fit_intercept : bool, default True

Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.

intercept_scaling : bool

Ignored

class_weight : dict or ‘balanced’

Ignored

random_state : int, RandomState, or None

The seed of the pseudo random number generator to use when shuffling the data. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random. Used when solver == ‘sag’ or ‘liblinear’.

solver : {‘admm’, ‘gradient_descent’, ‘newton’, ‘lbfgs’, ‘proximal_grad’}

Solver to use. See Algorithms for details

max_iter : int, default 100

Maximum number of iterations taken for the solvers to converge.

multi_class : str, default ‘ovr’

Ignored. Multiclass solvers not currently supported.

verbose : int, default 0

Ignored

warm_start : bool, default False

Ignored

n_jobs : int, default 1

Ignored

solver_kwargs : dict, optional, default None

Extra keyword arguments to pass through to the solver.

Attributes:
coef_ : array, shape (n_classes, n_features)

The learned value for the model’s coefficients

intercept_ : float of None

The learned value for the intercept, if one was added to the model

Examples

>>> from dask_glm.datasets import make_regression
>>> X, y = make_regression()
>>> lr = LinearRegression()
>>> lr.fit(X, y)
>>> lr.predict(X)
>>> lr.predict(X)
>>> lr.score(X, y)

Methods

fit(X[, y]) Fit the model on the training data
get_params([deep]) Get parameters for this estimator.
predict(X) Predict values for samples in X.
score(X, y) Returns the coefficient of determination R^2 of the prediction.
set_params(**params) Set the parameters of this estimator.
__init__(penalty='l2', dual=False, tol=0.0001, C=1.0, fit_intercept=True, intercept_scaling=1.0, class_weight=None, random_state=None, solver='admm', max_iter=100, multi_class='ovr', verbose=0, warm_start=False, n_jobs=1, solver_kwargs=None)

Initialize self. See help(type(self)) for accurate signature.

family

The family this estimator is for.

fit(X, y=None)

Fit the model on the training data

Parameters:
X: array-like, shape (n_samples, n_features)
y : array-like, shape (n_samples,)
Returns:
self : objectj
get_params(deep=True)

Get parameters for this estimator.

Parameters:
deep : boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
params : mapping of string to any

Parameter names mapped to their values.

predict(X)

Predict values for samples in X.

Parameters:
X : array-like, shape = [n_samples, n_features]
Returns:
C : array, shape = [n_samples,]

Predicted value for each sample

score(X, y)

Returns the coefficient of determination R^2 of the prediction.

The coefficient R^2 is defined as (1 - u/v), where u is the residual sum of squares ((y_true - y_pred) ** 2).sum() and v is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.

Parameters:
X : array-like, shape = (n_samples, n_features)

Test samples.

y : array-like, shape = (n_samples) or (n_samples, n_outputs)

True values for X.

Returns:
score : float

R^2 of self.predict(X) wrt. y.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
self