Changelog
Contents
Changelog¶
Version 2023.3.24¶
Compatibility with Python 3.10
Dropped support for Python 3.7
Compatibility with scikit-learn 1.2.0 and newer
Version 2021.11.30¶
Fixed regression in meta inference for wrappers when the base estimator returned a scipy.sparse matrix (GH#889)
Version 2021.11.16¶
Version 2021.10.17¶
Added support for scikit-learn 1.0.0. scikit-learn 1.0.0 is now the minimum-supported version.
Version 1.9.0¶
Version 1.8.0¶
Compatibility with scikit-learn 0.24
Version 1.7.0¶
Improved documentation for working with PyTorch models, see pytorch (GH#699)
Improved documentation for working with Keras / TensorFlow models, see Keras and Tensorflow (GH#713)
Fixed handling of remote vocabularies in
dask_ml.feature_extraction.text.HashingVectorizer
(GH#719)Added
dask_ml.metrics.regression.mean_squared_log_error()
(GH#725)Allow user-provided categories in
dask_ml.preprocessing.OneHotEncoder
(GH#727)Added
dask_ml.linear_model.LogisticRegression.decision_function()
(GH#728)Added
compute
argument todask_ml.decomposition.TruncatedSVD
(GH#743)Fixed sign stability in incremental PCA (GH#742)
Version 1.6.0¶
Improved documentation for RandomizedSearchCV
Improved logging in
dask_ml.cluster.KMeans
(GH#688)Added support for
dask.dataframe
objects indask_ml.model_selection.HyperbandSearchCV
(GH#701)Added
squared=True
option todask_ml.metrics.mean_squared_error
(GH#707)Added
dask_ml.feature_extraction.text.CountVectorizer
(GH#705)
Version 1.5.0¶
Support for Python 3.8 (GH#669)
Compatibility with Scikit-Learn 0.23.0 (GH#669)
Scikit-Learn 0.23.0 or newer is now required (GH#669)
Removed previously deprecated Partial classes. Use
dask_ml.wrappers.Incremental
instead (GH#674)
Version 1.4.0¶
Added
dask_ml.decomposition.IncrementalPCA
for out-of-core / distributed incremental PCA (GH#619)Improved logging and monitoring in incremental model selection (GH#528)
Added
dask_ml.ensemble.BlockwiseVotingClassifier
anddask_ml.ensemble.BlockwiseVotingRegressor
for blockwise training and ensemble prediction (GH#657)Improved documentation for Hyper Parameter Search (GH#432)
Version 1.3.0¶
Added
shuffle
support todask_ml.model_selection.train_test_split()
forDataFrame
input (GH#625)Improved performance of
dask_ml.model_selection.GridSearchCV
by re-using cached tasks (GH#622)Add support for
DataFrame
todask_ml.model_selection.GridSearchCV
(GH#612)Fixed
dask_ml.linear_model.LinearRegression.score()
to user2_score
rather thanmse
(GH#614)Handle missing data in
dask_ml.preprocessing.StandardScaler
(GH#608)
Version 1.2.0¶
Changed the name of the second positional argument in
model_selection.IncrementalSearchCV
fromparam_distribution
toparameters
to match the name of the base class.Compatibility with scikit-learn 0.22.1.
Added
dask_ml.preprocessing.BlockTransfomer
an extension of scikit-learn’s FunctionTransformer (GH#366).Added
dask_ml.feature_extraction.FeatureHasher
which is similar to scikit-learn’s implementation.
Version 1.1.1¶
Version 1.1.0¶
Non-arrays (e.g. Dask Bags and DataFrames) are now allowed in
dask_ml.wrappers.Incremental
. This is useful for text classification pipelines (pr:570)The index is now preserved in
dask_ml.preprocessing.PolynomialFeatures
for DataFrame inputs (GH#563)dask_ml.decomposition.PCA
now works with DataFrame inputs (GH#543)dask_ml.cluster.KMeans
handles inputs where some blocks are length-0 (GH#559)Improved error reporting for mixed inputs to
dask_ml.model_selection.train_test_split()
(GH#552)Removed deprecated
dask_ml.joblib
module. Usejoblib.parallel_backend
instead (GH#545)dask_ml.preprocessing.QuantileTransformer
now handles DataFrame input (GH#533)
Version 1.0.0¶
Added new hyperparameter search meta-estimators for hyperparameter search on distributed datasets:
HyperbandSearchCV
andSuccessiveHalvingSearchCV
Dropped Python 2 support (GH#500)
Version 0.13.0¶
Compatibility with scikit-learn 0.21.1
Cross-validation results in
GridSearchCV
andRandomizedSearchCV
are now gathered as completed, in case a worker is lost (GH#433)Fixed bug in
dask_ml.model_selection.train_test_split()
when only one of train / test size is provided (GH#502)Consistent random state for
dask_ml.model_selection.IncrementalSearchCV
Fixed various issues with 32-bit Windows builds (GH#487)
Note
dask-ml 0.13.0 will be the last release to support Python 2.
Version 0.12.0¶
API Breaking Changes¶
dask_ml.model_selection.IncrementalSearchCV
now returns Dask objects for post-fit methods like.predict
, etc (GH#423).
Version 0.11.0¶
Note that this version of Dask-ML requires scikit-learn >= 0.20.0.
Enhancements¶
Added
dask_ml.model_selection.IncrementalSearchCV
, a meta-estimator for hyperparameter optimization on larger-than-memory datasets (GH#356). See Incremental Hyperparameter Optimization for more.Added
dask_ml.preprocessing.PolynomialTransformer
, a drop-in replacement for the scikit-learn version (GH#347).Added auto-rechunking to Dask Arrays with more than one block along the features in
dask_ml.model_selection.ParallelPostFit
(GH#376)Added support for Dask DataFrame inputs to
dask_ml.cluster.KMeans
(GH#390)Added a
compute
keyword todask_ml.wrappers.ParallelPostFit.score()
to support lazily evaluating a model’s score (GH#402)
Bug Fixes¶
Changed
dask_ml.wrappers.ParallelPostFit
to automatically rechunk input arrays to methods likepredict
when they have more than one block along the features (GH#376).Bug in
dask_ml.impute.SimpleImputer
with Dask DataFrames filling the count of the most frequent item, rather than the item itself (GH#385).Bug in
dask_ml.model_selection.ShuffleSplit
returning the same split when therandom_state
was set (GH#380).
Version 0.10.0¶
Enhancements¶
Added support for
dask.dataframe.DataFrame
todask_ml.model_selection.train_test_split()
(GH#351)
Version 0.9.0¶
Enhancements¶
Bug Fixes¶
Fixed handling of errors in the predict and score steps of
dask_ml.model_selection.GridSearchCV
anddask_ml.model_selection.RandomizedSearchCV
(GH#339)Compatability with Dask 0.18 for
dask_ml.preprocessing.LabelEncoder
(you’ll also notice improved performance) (GH#336).
Documentation Updates¶
Added a Dask-ML Roadmap. Please open an issue if you’d like something to be included on the roadmap. (GH#322)
Added many Examples to the documentation and the dask examples binder.
Version 0.8.0¶
Enhancements¶
Automatically replace default scikit-learn scorers with dask-aware versions in Incremental (GH#200)
Added the
dask_ml.metrics.log_loss()
loss function andneg_log_loss
scorer (GH#318)Fixed handling of array-like fit-parameters to GridSearchCV and BaseSearchCV (GH#320)
Version 0.7.0¶
Enhancements¶
Added
sample_weight
support fordask_ml.metrics.accuracy_score()
. (GH#217)Improved performance of training on
dask_ml.cluster.SpectralClustering
(GH#152)Added
dask_ml.preprocessing.LabelEncoder
. (GH#226)Fixed issue in
model_selection
meta-estimators not respecting the default Dask scheduler (GH#260)
API Breaking Changes¶
Removed the
basis_inds_
attribute fromdask_ml.cluster.SpectralClustering
as its no longer used (GH#152)Change
dask_ml.wrappers.Incremental.fit()
to clone the underlying estimator before training (GH#258). This induces a few changesThe underlying estimator no longer gives access to learned attributes like
coef_
. We recommend usingIncremental.coef_
.State no longer leaks between successive
fit
calls. Note thatIncremental.partial_fit()
is still available if you want state, like learned attributes or random seeds, to be re-used. This is useful if you’re making multiple passes over the training data.
Changed
get_params
andset_params
fordask_ml.wrappers.Incremental
to no longer magically get / set parameters for the underlying estimator (GH#258). To specify parameters for the underlying estimator, use the double-underscore prefix convention established by scikit-learn:inc.set_params('estimator__alpha': 10)
Reorganization¶
Dask-SearchCV is now being developed in the dask/dask-ml
repository. Users
who previously installed dask-searchcv
should now just install dask-ml
.
Version 0.6.0¶
API Breaking Changes¶
Removed the get keyword from the incremental learner
fit
methods. (GH#187)Deprecated the various
Partial*
estimators in favor of thedask_ml.wrappers.Incremental
meta-estimator (GH#190)
Enhancements¶
Added a new meta-estimator
dask_ml.wrappers.Incremental
for wrapping any estimator with a partial_fit method. See Incremental Meta-estimator for more. (GH#190)Added an R2-score metric
dask_ml.metrics.r2_score()
.
Version 0.5.0¶
API Breaking Changes¶
The n_samples_seen_ attribute on
dask_ml.preprocessing.StandardScalar
is now consistentlynumpy.nan
(GH#157).Changed the algorithm for
dask_ml.datasets.make_blobs()
,dask_ml.datasets.make_regression()
anddask_ml.datasets.make_classfication()
to reduce the single-machine peak memory usage (GH#67)
Enhancements¶
Added
dask_ml.model_selection.train_test_split()
anddask_ml.model_selection.ShuffleSplit
(GH#172)Added
dask_ml.metrics.classification_score()
,dask_ml.metrics.mean_absolute_error()
, anddask_ml.metrics.mean_squared_error()
.
Version 0.4.1¶
This release added several new estimators.
Enhancements¶
Added dask_ml.preprocessing.RobustScaler
¶
Scale features using statistics that are robust to outliers. This mirrors
sklearn.preprocessing.RobustScalar
(GH#62).
Added dask_ml.preprocessing.OrdinalEncoder
¶
Encodes categorical features as ordinal, in one ordered feature (GH#119).
Added dask_ml.wrappers.ParallelPostFit
¶
A meta-estimator for fitting with any scikit-learn estimator, but post-processing
(predict
, transform
, etc.) in parallel on dask arrays.
See Parallel Meta-estimators for more (GH#132).
Version 0.4.0¶
API Changes¶
Changed the arguments of the dask-glm based estimators in
dask_glm.linear_model
to match scikit-learn’s API (GH#94).To specify
lambuh
useC = 1.0 / lambduh
(the default of 1.0 is unchanged)The
rho
,over_relax
,abstol
andreltol
arguments have been removed. Provide them insolver_kwargs
instead.
This affects the
LinearRegression
,LogisticRegression
andPoissonRegression
estimators.