dask_ml.datasets.make_counts(n_samples=1000, n_features=100, n_informative=2, scale=1.0, chunks=100, random_state=None)

Generate a dummy dataset for modeling count data.

n_samples : int

number of rows in the output array

n_features : int

number of columns (features) in the output array

n_informative : int

number of features that are correlated with the outcome

scale : float

Scale the true coefficient array by this

chunks : int

Number of rows per dask array block.

random_state : int, RandomState instance or None (default)

Determines random number generation for dataset creation. Pass an int for reproducible output across multiple function calls. See Glossary.

X : dask.array, size (n_samples, n_features)
y : dask.array, size (n_samples,)

array of non-negative integer-valued data


>>> X, y = make_counts()