dask_ml.datasets.make_classification_df(n_samples=10000, response_rate=0.5, predictability=0.1, random_state=None, chunks=None, dates=None, **kwargs)

Uses the make_classification function to create a dask dataframe for testing.

n_samplesint, default is 10000

number of observations to be generated

response_ratefloat between 0.0 and 0.5, default is 0.5

percentage of sample to be response records max is 0.5

predictabilityfloat between 0.0 and 1.0, default is 0.1

how hard is the response to predict (1.0 being easiest)

random_stateint, default is None

seed for reproducibility purposes


How to chunk the array. Must be one of the following forms: - A blocksize like 1000.

datestuple, optional, default is None

tuple of start and end date objects to use for generating random dates in the date column


Other keyword arguments to pass to sklearn.datasets.make_classification

XDask DataFrame of shape [n_samples, n_features] or

[n_samples, n_features + 1] when dates specified The input samples.

yDask Series of shape [n_samples] or [n_samples, n_targets]

The output values.