Only the sample dataset as a result

Level 1
Only the sample dataset as a result

Hello, I am new to Dataiku.

When I want to analyse or download my final dataset, it only performs on the sample (first records) defined in the preparation steps, whereas I want it to perform on all my data.

I read the doc but I cannot see something about this, maybe it is my parameters or maybe there is a size limit? (I use a professional licence) Also, the prepared datasets were run on local stream maybe it can explain this?

Thank you for your help


0 Kudos
1 Reply
Dataiker Alumni

Hi @leaw and welcome to the Dataiku Community!  While you wait for a more complete response, I wanted to at least provide some helpful information.

By default, the first 10,000 records of your dataset are selected for the sample. While this sampling method does not provide the best sample quality, it allows you to get your sample very quickly, whatever the size of your dataset.

The sampling can be configured in the “Sampling” tab. More information about Sampling can be found here:

1) Sampling (Documentation)
2) Concept: Sampling  (Academy and Knowledge Base)

I hope this helps!

Looking for more resources to help you use Dataiku effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as ‘Accepted Solution’ to help others like you!


Labels (2)
A banner prompting to get Dataiku