dask_ml.datasets.make_classification_df
dask_ml.datasets.make_classification_df¶
- dask_ml.datasets.make_classification_df(n_samples=10000, response_rate=0.5, predictability=0.1, random_state=None, chunks=None, dates=None, **kwargs)¶
- Uses the make_classification function to create a dask dataframe for testing. - Parameters
- n_samplesint, default is 10000
- number of observations to be generated 
- response_ratefloat between 0.0 and 0.5, default is 0.5
- percentage of sample to be response records max is 0.5 
- predictabilityfloat between 0.0 and 1.0, default is 0.1
- how hard is the response to predict (1.0 being easiest) 
- random_stateint, default is None
- seed for reproducibility purposes 
- chunksint
- How to chunk the array. Must be one of the following forms: - A blocksize like 1000. 
- datestuple, optional, default is None
- tuple of start and end date objects to use for generating random dates in the date column 
- **kwargs
- Other keyword arguments to pass to sklearn.datasets.make_classification 
 
- Returns
- XDask DataFrame of shape [n_samples, n_features] or
- [n_samples, n_features + 1] when dates specified The input samples. 
- yDask Series of shape [n_samples] or [n_samples, n_targets]
- The output values.