dask_ml.datasets.make_classification_df

dask_ml.datasets.make_classification_df

dask_ml.datasets.make_classification_df(n_samples=10000, response_rate=0.5, predictability=0.1, random_state=None, chunks=None, dates=None, **kwargs)

Uses the make_classification function to create a dask dataframe for testing.

Parameters
n_samplesint, default is 10000

number of observations to be generated

response_ratefloat between 0.0 and 0.5, default is 0.5

percentage of sample to be response records max is 0.5

predictabilityfloat between 0.0 and 1.0, default is 0.1

how hard is the response to predict (1.0 being easiest)

random_stateint, default is None

seed for reproducibility purposes

chunksint

How to chunk the array. Must be one of the following forms: - A blocksize like 1000.

datestuple, optional, default is None

tuple of start and end date objects to use for generating random dates in the date column

**kwargs

Other keyword arguments to pass to sklearn.datasets.make_classification

Returns
XDask DataFrame of shape [n_samples, n_features] or

[n_samples, n_features + 1] when dates specified The input samples.

yDask Series of shape [n_samples] or [n_samples, n_targets]

The output values.