dask_ml.datasets.make_classification_df
dask_ml.datasets.make_classification_df¶
- dask_ml.datasets.make_classification_df(n_samples=10000, response_rate=0.5, predictability=0.1, random_state=None, chunks=None, dates=None, **kwargs)¶
Uses the make_classification function to create a dask dataframe for testing.
- Parameters
- n_samplesint, default is 10000
number of observations to be generated
- response_ratefloat between 0.0 and 0.5, default is 0.5
percentage of sample to be response records max is 0.5
- predictabilityfloat between 0.0 and 1.0, default is 0.1
how hard is the response to predict (1.0 being easiest)
- random_stateint, default is None
seed for reproducibility purposes
- chunksint
How to chunk the array. Must be one of the following forms: - A blocksize like 1000.
- datestuple, optional, default is None
tuple of start and end date objects to use for generating random dates in the date column
- **kwargs
Other keyword arguments to pass to sklearn.datasets.make_classification
- Returns
- XDask DataFrame of shape [n_samples, n_features] or
[n_samples, n_features + 1] when dates specified The input samples.
- yDask Series of shape [n_samples] or [n_samples, n_targets]
The output values.