Entering a seed for Randomly dispatch data on the output datasets of a split recipe

Solved!
MRvLuijpen
Entering a seed for Randomly dispatch data on the output datasets of a split recipe

Hello Community.

In the split recipe we have the option to use the dispatch mode : "Random subset of column(s) value".

My question: Is it possible to set a seed for this thus always have the same 'random' result.

Thanks in advance.

Marc Robert

0 Kudos
1 Solution
KeijiY
Dataiker

Hello @MRvLuijpen,

Thank you so much for posting a question on Dataiku Community.

A seed cannot be set for the "Random subset of column(s) value" mode, but in this mode, the behaviour is actually deterministic and the results will be the same as long as the input data and the Recipe's settings are not changed (so, a seed is not needed to be set for this mode).

I hope this would help. Please let us know if you have any further questions about this topic.

Sincerely,
Keiji

View solution in original post

0 Kudos
2 Replies
KeijiY
Dataiker

Hello @MRvLuijpen,

Thank you so much for posting a question on Dataiku Community.

A seed cannot be set for the "Random subset of column(s) value" mode, but in this mode, the behaviour is actually deterministic and the results will be the same as long as the input data and the Recipe's settings are not changed (so, a seed is not needed to be set for this mode).

I hope this would help. Please let us know if you have any further questions about this topic.

Sincerely,
Keiji

0 Kudos
MRvLuijpen
Author

Hello @KeijiY ,

Thank you for your response. It is clear.

Would it be an idea to implement the possibility of introducing a seed in there. 

For our business case we would like to set a seed in order to create multiple random, but different, subsets.

A reference: sklearn groupshufflesplit parameter random_state

Thanks again.

Sincerely, 

Marc Robert

0 Kudos