Changelog¶
Version 0.12.0¶
API Breaking Changes¶
dask_ml.model_selection.IncrementalSearchCV
now returns Dask objects for post-fit methods like.predict
, etc (:issues:`423`).
Version 0.11.0¶
Note that this version of Dask-ML requires scikit-learn >= 0.20.0.
Enhancements¶
- Added
dask_ml.model_selection.IncrementalSearchCV
, a meta-estimator for hyperparamter optimization on larger-than-memory datasets (GH#356). See Incremental Hyperparameter Optimization for more. - Added
dask_ml.preprocessing.PolynomialTransformer
, a drop-in replacement for the scikit-learn version (GH#347). - Added auto-rechunking to Dask Arrays with more than one block along the features in
dask_ml.model_selection.ParallelPostFit
(GH#376) - Added support for Dask DataFrame inputs to
dask_ml.cluster.KMeans
(GH#390) - Added a
compute
keyword todask_ml.wrappers.ParallelPostFit.score()
to support lazily evaluating a model’s score (GH#402)
Bug Fixes¶
- Changed
dask_ml.wrappers.ParallelPostFit
to automatically rechunk input arrays to methods likepredict
when they have more than one block along the features (GH#376). - Bug in
dask_ml.impute.SimpleImputer
with Dask DataFrames filling the count of the most frequent item, rather than the item itself (GH#385). - Bug in
dask_ml.model_selection.ShuffleSplit
returning the same split when therandom_state
was set (GH#380).
Version 0.10.0¶
Enhancements¶
- Added support for
dask.dataframe.DataFrame
todask_ml.model_selection.train_test_split()
(GH#351)
Version 0.9.0¶
Enhancements¶
Bug Fixes¶
- Fixed handling of errors in the predict and score steps of
dask_ml.model_selection.GridSearchCV
anddask_ml.model_selection.RandomizedSearchCV
(GH#339) - Compatability with Dask 0.18 for
dask_ml.preprocessing.LabelEncoder
(you’ll also notice improved performance) (GH#336).
Documentation Updates¶
- Added a Dask-ML Roadmap. Please open an issue if you’d like something to be included on the roadmap. (GH#322)
- Added many Examples to the documentation and the dask examples binder.
Version 0.8.0¶
Enhancements¶
- Automatically replace default scikit-learn scorers with dask-aware versions in Incremental (GH#200)
- Added the
dask_ml.metrics.log_loss()
loss function andneg_log_loss
scorer (GH#318) - Fixed handling of array-like fit-parameters to GridSearchCV and BaseSearchCV (GH#320)
Version 0.7.0¶
Enhancements¶
- Added
sample_weight
support fordask_ml.metrics.accuracy_score()
. (GH#217) - Improved performance of training on
dask_ml.cluster.SpectralClustering
(GH#152) - Added
dask_ml.preprocessing.LabelEncoder
. (GH#226) - Fixed issue in
model_selection
meta-estimators not respecting the default Dask scheduler (GH#260)
API Breaking Changes¶
Removed the
basis_inds_
attribute fromdask_ml.cluster.SpectralClustering
as its no longer used (GH#152)Change
dask_ml.wrappers.Incremental.fit()
to clone the underlying estimator before training (GH#258). This induces a few changes- The underlying estimator no longer gives access to learned attributes like
coef_
. We recommend usingIncremental.coef_
. - State no longer leaks between successive
fit
calls. Note thatIncremental.partial_fit()
is still available if you want state, like learned attributes or random seeds, to be re-used. This is useful if you’re making multiple passes over the training data.
- The underlying estimator no longer gives access to learned attributes like
Changed
get_params
andset_params
fordask_ml.wrappers.Incremental
to no longer magically get / set parameters for the underlying estimator (GH#258). To specify parameters for the underlying estimator, use the double-underscore prefix convention established by scikit-learn:inc.set_params('estimator__alpha': 10)
Reorganization¶
Dask-SearchCV is now being developed in the dask/dask-ml
repository. Users
who previously installed dask-searchcv
should now just install dask-ml
.
Version 0.6.0¶
API Breaking Changes¶
- Removed the get keyword from the incremental learner
fit
methods. (GH#187) - Deprecated the various
Partial*
estimators in favor of thedask_ml.wrappers.Incremental
meta-estimator (GH#190)
Enhancements¶
- Added a new meta-estimator
dask_ml.wrappers.Incremental
for wrapping any estimator with a partial_fit method. See Incremental Meta-estimator for more. (GH#190) - Added an R2-score metric
dask_ml.metrics.r2_score()
.
Version 0.5.0¶
API Breaking Changes¶
Enhancements¶
- Added
dask_ml.model_selection.train_test_split()
anddask_ml.model_selection.ShuffleSplit
(GH#172) - Added
dask_ml.metrics.classification_score()
,dask_ml.metrics.mean_absolute_error()
, anddask_ml.metrics.mean_squared_error()
.
Version 0.4.1¶
This release added several new estimators.
Enhancements¶
Added dask_ml.preprocessing.RobustScaler
¶
Scale features using statistics that are robust to outliers. This mirrors
sklearn.preprocessing.RobustScalar
(GH#62).
Added dask_ml.preprocessing.OrdinalEncoder
¶
Encodes categorical features as ordinal, in one ordered feature (GH#119).
Added dask_ml.wrappers.ParallelPostFit
¶
A meta-estimator for fitting with any scikit-learn estimator, but post-processing
(predict
, transform
, etc.) in parallel on dask arrays.
See Parallel Meta-estimators for more (GH#132).
Version 0.4.0¶
API Changes¶
Changed the arguments of the dask-glm based estimators in
dask_glm.linear_model
to match scikit-learn’s API (GH#94).- To specify
lambuh
useC = 1.0 / lambduh
(the default of 1.0 is unchanged) - The
rho
,over_relax
,abstol
andreltol
arguments have been removed. Provide them insolver_kwargs
instead.
This affects the
LinearRegression
,LogisticRegression
andPoissonRegression
estimators.- To specify