Entering a seed for Randomly dispatch data on the output datasets of a split recipe

MRvLuijpen · October 2021

Hello Community.

In the split recipe we have the option to use the dispatch mode : "Random subset of column(s) value".

My question: Is it possible to set a seed for this thus always have the same 'random' result.

Thanks in advance.

Marc Robert

Keiji · October 2021

Hello @MRvLuijpen
,

Thank you so much for posting a question on Dataiku Community.

A seed cannot be set for the "Random subset of column(s) value" mode, but in this mode, the behaviour is actually deterministic and the results will be the same as long as the input data and the Recipe's settings are not changed (so, a seed is not needed to be set for this mode).

I hope this would help. Please let us know if you have any further questions about this topic.

Sincerely,
Keiji

MRvLuijpen · October 2021

Hello @KeijiY
,

Thank you for your response. It is clear.

Would it be an idea to implement the possibility of introducing a seed in there.

For our business case we would like to set a seed in order to create multiple random, but different, subsets.

A reference: sklearn groupshufflesplit parameter random_state

Thanks again.

Sincerely,

Marc Robert

Entering a seed for Randomly dispatch data on the output datasets of a split recipe

Best Answer

Answers

Categories

Setup Info

Tags