dask_ml.datasets.make_classification_df

dask_ml.datasets.make_classification_df¶

dask_ml.datasets.make_classification_df(n_samples=10000, response_rate=0.5, predictability=0.1, random_state=None, chunks=None, dates=None, **kwargs)¶

Uses the make_classification function to create a dask dataframe for testing.

Parameters

n_samplesint, default is 10000: number of observations to be generated
response_ratefloat between 0.0 and 0.5, default is 0.5: percentage of sample to be response records max is 0.5
predictabilityfloat between 0.0 and 1.0, default is 0.1: how hard is the response to predict (1.0 being easiest)
random_stateint, default is None: seed for reproducibility purposes
chunksint: How to chunk the array. Must be one of the following forms: - A blocksize like 1000.
datestuple, optional, default is None: tuple of start and end date objects to use for generating random dates in the date column
**kwargs: Other keyword arguments to pass to sklearn.datasets.make_classification

Returns

XDask DataFrame of shape [n_samples, n_features] or: [n_samples, n_features + 1] when dates specified The input samples.
yDask Series of shape [n_samples] or [n_samples, n_targets]: The output values.

dask_ml.datasets.make_classification

Scikit-Learn & Joblib