Changelog
Contents
Changelog¶
Version 2025.1.0¶
Version 2023.3.24¶
Compatibility with Python 3.10
Dropped support for Python 3.7
Compatibility with scikit-learn 1.2.0 and newer
Version 2021.11.30¶
Fixed regression in meta inference for wrappers when the base estimator returned a scipy.sparse matrix (GH#889)
Version 2021.11.16¶
Version 2021.10.17¶
Added support for scikit-learn 1.0.0. scikit-learn 1.0.0 is now the minimum-supported version.
Version 1.9.0¶
Version 1.8.0¶
Compatibility with scikit-learn 0.24
Version 1.7.0¶
Improved documentation for working with PyTorch models, see pytorch (GH#699)
Improved documentation for working with Keras / TensorFlow models, see Keras and Tensorflow (GH#713)
Fixed handling of remote vocabularies in
dask_ml.feature_extraction.text.HashingVectorizer(GH#719)Added
dask_ml.metrics.regression.mean_squared_log_error()(GH#725)Allow user-provided categories in
dask_ml.preprocessing.OneHotEncoder(GH#727)Added
dask_ml.linear_model.LogisticRegression.decision_function()(GH#728)Added
computeargument todask_ml.decomposition.TruncatedSVD(GH#743)Fixed sign stability in incremental PCA (GH#742)
Version 1.6.0¶
Improved documentation for RandomizedSearchCV
Improved logging in
dask_ml.cluster.KMeans(GH#688)Added support for
dask.dataframeobjects indask_ml.model_selection.HyperbandSearchCV(GH#701)Added
squared=Trueoption todask_ml.metrics.mean_squared_error(GH#707)Added
dask_ml.feature_extraction.text.CountVectorizer(GH#705)
Version 1.5.0¶
Support for Python 3.8 (GH#669)
Compatibility with Scikit-Learn 0.23.0 (GH#669)
Scikit-Learn 0.23.0 or newer is now required (GH#669)
Removed previously deprecated Partial classes. Use
dask_ml.wrappers.Incrementalinstead (GH#674)
Version 1.4.0¶
Added
dask_ml.decomposition.IncrementalPCAfor out-of-core / distributed incremental PCA (GH#619)Improved logging and monitoring in incremental model selection (GH#528)
Added
dask_ml.ensemble.BlockwiseVotingClassifieranddask_ml.ensemble.BlockwiseVotingRegressorfor blockwise training and ensemble prediction (GH#657)Improved documentation for Hyper Parameter Search (GH#432)
Version 1.3.0¶
Added
shufflesupport todask_ml.model_selection.train_test_split()forDataFrameinput (GH#625)Improved performance of
dask_ml.model_selection.GridSearchCVby re-using cached tasks (GH#622)Add support for
DataFrametodask_ml.model_selection.GridSearchCV(GH#612)Fixed
dask_ml.linear_model.LinearRegression.score()to user2_scorerather thanmse(GH#614)Handle missing data in
dask_ml.preprocessing.StandardScaler(GH#608)
Version 1.2.0¶
Changed the name of the second positional argument in
model_selection.IncrementalSearchCVfromparam_distributiontoparametersto match the name of the base class.Compatibility with scikit-learn 0.22.1.
Added
dask_ml.preprocessing.BlockTransfomeran extension of scikit-learn’s FunctionTransformer (GH#366).Added
dask_ml.feature_extraction.FeatureHasherwhich is similar to scikit-learn’s implementation.
Version 1.1.1¶
Version 1.1.0¶
Non-arrays (e.g. Dask Bags and DataFrames) are now allowed in
dask_ml.wrappers.Incremental. This is useful for text classification pipelines (pr:570)The index is now preserved in
dask_ml.preprocessing.PolynomialFeaturesfor DataFrame inputs (GH#563)dask_ml.decomposition.PCAnow works with DataFrame inputs (GH#543)dask_ml.cluster.KMeanshandles inputs where some blocks are length-0 (GH#559)Improved error reporting for mixed inputs to
dask_ml.model_selection.train_test_split()(GH#552)Removed deprecated
dask_ml.joblibmodule. Usejoblib.parallel_backendinstead (GH#545)dask_ml.preprocessing.QuantileTransformernow handles DataFrame input (GH#533)
Version 1.0.0¶
Added new hyperparameter search meta-estimators for hyperparameter search on distributed datasets:
HyperbandSearchCVandSuccessiveHalvingSearchCVDropped Python 2 support (GH#500)
Version 0.13.0¶
Compatibility with scikit-learn 0.21.1
Cross-validation results in
GridSearchCVandRandomizedSearchCVare now gathered as completed, in case a worker is lost (GH#433)Fixed bug in
dask_ml.model_selection.train_test_split()when only one of train / test size is provided (GH#502)Consistent random state for
dask_ml.model_selection.IncrementalSearchCVFixed various issues with 32-bit Windows builds (GH#487)
Note
dask-ml 0.13.0 will be the last release to support Python 2.
Version 0.12.0¶
API Breaking Changes¶
dask_ml.model_selection.IncrementalSearchCVnow returns Dask objects for post-fit methods like.predict, etc (GH#423).
Version 0.11.0¶
Note that this version of Dask-ML requires scikit-learn >= 0.20.0.
Enhancements¶
Added
dask_ml.model_selection.IncrementalSearchCV, a meta-estimator for hyperparameter optimization on larger-than-memory datasets (GH#356). See Incremental Hyperparameter Optimization for more.Added
dask_ml.preprocessing.PolynomialTransformer, a drop-in replacement for the scikit-learn version (GH#347).Added auto-rechunking to Dask Arrays with more than one block along the features in
dask_ml.model_selection.ParallelPostFit(GH#376)Added support for Dask DataFrame inputs to
dask_ml.cluster.KMeans(GH#390)Added a
computekeyword todask_ml.wrappers.ParallelPostFit.score()to support lazily evaluating a model’s score (GH#402)
Bug Fixes¶
Changed
dask_ml.wrappers.ParallelPostFitto automatically rechunk input arrays to methods likepredictwhen they have more than one block along the features (GH#376).Bug in
dask_ml.impute.SimpleImputerwith Dask DataFrames filling the count of the most frequent item, rather than the item itself (GH#385).Bug in
dask_ml.model_selection.ShuffleSplitreturning the same split when therandom_statewas set (GH#380).
Version 0.10.0¶
Enhancements¶
Added support for
dask.dataframe.DataFrametodask_ml.model_selection.train_test_split()(GH#351)
Version 0.9.0¶
Enhancements¶
Bug Fixes¶
Fixed handling of errors in the predict and score steps of
dask_ml.model_selection.GridSearchCVanddask_ml.model_selection.RandomizedSearchCV(GH#339)Compatability with Dask 0.18 for
dask_ml.preprocessing.LabelEncoder(you’ll also notice improved performance) (GH#336).
Documentation Updates¶
Added a Dask-ML Roadmap. Please open an issue if you’d like something to be included on the roadmap. (GH#322)
Added many Examples to the documentation and the dask examples binder.
Version 0.8.0¶
Enhancements¶
Automatically replace default scikit-learn scorers with dask-aware versions in Incremental (GH#200)
Added the
dask_ml.metrics.log_loss()loss function andneg_log_lossscorer (GH#318)Fixed handling of array-like fit-parameters to GridSearchCV and BaseSearchCV (GH#320)
Version 0.7.0¶
Enhancements¶
Added
sample_weightsupport fordask_ml.metrics.accuracy_score(). (GH#217)Improved performance of training on
dask_ml.cluster.SpectralClustering(GH#152)Added
dask_ml.preprocessing.LabelEncoder. (GH#226)Fixed issue in
model_selectionmeta-estimators not respecting the default Dask scheduler (GH#260)
API Breaking Changes¶
Removed the
basis_inds_attribute fromdask_ml.cluster.SpectralClusteringas its no longer used (GH#152)Change
dask_ml.wrappers.Incremental.fit()to clone the underlying estimator before training (GH#258). This induces a few changesThe underlying estimator no longer gives access to learned attributes like
coef_. We recommend usingIncremental.coef_.State no longer leaks between successive
fitcalls. Note thatIncremental.partial_fit()is still available if you want state, like learned attributes or random seeds, to be re-used. This is useful if you’re making multiple passes over the training data.
Changed
get_paramsandset_paramsfordask_ml.wrappers.Incrementalto no longer magically get / set parameters for the underlying estimator (GH#258). To specify parameters for the underlying estimator, use the double-underscore prefix convention established by scikit-learn:inc.set_params('estimator__alpha': 10)
Reorganization¶
Dask-SearchCV is now being developed in the dask/dask-ml repository. Users
who previously installed dask-searchcv should now just install dask-ml.
Version 0.6.0¶
API Breaking Changes¶
Removed the get keyword from the incremental learner
fitmethods. (GH#187)Deprecated the various
Partial*estimators in favor of thedask_ml.wrappers.Incrementalmeta-estimator (GH#190)
Enhancements¶
Added a new meta-estimator
dask_ml.wrappers.Incrementalfor wrapping any estimator with a partial_fit method. See Incremental Meta-estimator for more. (GH#190)Added an R2-score metric
dask_ml.metrics.r2_score().
Version 0.5.0¶
API Breaking Changes¶
The n_samples_seen_ attribute on
dask_ml.preprocessing.StandardScalaris now consistentlynumpy.nan(GH#157).Changed the algorithm for
dask_ml.datasets.make_blobs(),dask_ml.datasets.make_regression()anddask_ml.datasets.make_classfication()to reduce the single-machine peak memory usage (GH#67)
Enhancements¶
Added
dask_ml.model_selection.train_test_split()anddask_ml.model_selection.ShuffleSplit(GH#172)Added
dask_ml.metrics.classification_score(),dask_ml.metrics.mean_absolute_error(), anddask_ml.metrics.mean_squared_error().
Version 0.4.1¶
This release added several new estimators.
Enhancements¶
Added dask_ml.preprocessing.RobustScaler¶
Scale features using statistics that are robust to outliers. This mirrors
sklearn.preprocessing.RobustScalar (GH#62).
Added dask_ml.preprocessing.OrdinalEncoder¶
Encodes categorical features as ordinal, in one ordered feature (GH#119).
Added dask_ml.wrappers.ParallelPostFit¶
A meta-estimator for fitting with any scikit-learn estimator, but post-processing
(predict, transform, etc.) in parallel on dask arrays.
See Parallel Meta-estimators for more (GH#132).
Version 0.4.0¶
API Changes¶
Changed the arguments of the dask-glm based estimators in
dask_glm.linear_modelto match scikit-learn’s API (GH#94).To specify
lambuhuseC = 1.0 / lambduh(the default of 1.0 is unchanged)The
rho,over_relax,abstolandreltolarguments have been removed. Provide them insolver_kwargsinstead.
This affects the
LinearRegression,LogisticRegressionandPoissonRegressionestimators.