Entering a seed for Randomly dispatch data on the output datasets of a split recipe
Hello Community.
In the split recipe we have the option to use the dispatch mode : "Random subset of column(s) value".
My question: Is it possible to set a seed for this thus always have the same 'random' result.
Thanks in advance.
Marc Robert
Best Answer
-
Keiji Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 52 Dataiker
Hello @MRvLuijpen
,Thank you so much for posting a question on Dataiku Community.
A seed cannot be set for the "Random subset of column(s) value" mode, but in this mode, the behaviour is actually deterministic and the results will be the same as long as the input data and the Recipe's settings are not changed (so, a seed is not needed to be set for this mode).
I hope this would help. Please let us know if you have any further questions about this topic.
Sincerely,
Keiji
Answers
-
MRvLuijpen Partner, L2 Admin, L2 Designer, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Frontrunner 2022 Participant, Neuron 2023 Posts: 107 Neuron
Hello @KeijiY
,Thank you for your response. It is clear.
Would it be an idea to implement the possibility of introducing a seed in there.
For our business case we would like to set a seed in order to create multiple random, but different, subsets.
A reference: sklearn groupshufflesplit parameter random_state
Thanks again.
Sincerely,
Marc Robert