dask_ml.datasets.make_classification_df

dask_ml.datasets.make_classification_df(n_samples=10000, response_rate=0.5, predictability=0.1, random_state=None, chunks=None, dates=None, **kwargs)

Uses the make_classification function to create a dask dataframe for testing.

Parameters:
n_samples : int, default is 10000

number of observations to be generated

response_rate : float between 0.0 and 0.5, default is 0.5

percentage of sample to be response records max is 0.5

predictability : float between 0.0 and 1.0, default is 0.1

how hard is the response to predict (1.0 being easist)

random_state : int, default is None

seed for reproducability purposes

chunks : int

How to chunk the array. Must be one of the following forms: - A blocksize like 1000.

dates : tuple, optional, default is None

tuple of start and end date objects to use for generating random dates in the date column

**kwargs

Other keyword arguments to pass to sklearn.datasets.make_classification

Returns:
X : Dask DataFrame of shape [n_samples, n_features] or

[n_samples, n_features + 1] when dates specified The input samples.

y : Dask Series of shape [n_samples] or [n_samples, n_targets]

The output values.