Entering a seed for Randomly dispatch data on the output datasets of a split recipe

Options
MRvLuijpen
MRvLuijpen Partner, L2 Admin, L2 Designer, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Frontrunner 2022 Participant, Neuron 2023 Posts: 107 Neuron

Hello Community.

In the split recipe we have the option to use the dispatch mode : "Random subset of column(s) value".

My question: Is it possible to set a seed for this thus always have the same 'random' result.

Thanks in advance.

Marc Robert

Best Answer

  • Keiji
    Keiji Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 52 Dataiker
    Answer ✓
    Options

    Hello @MRvLuijpen
    ,

    Thank you so much for posting a question on Dataiku Community.

    A seed cannot be set for the "Random subset of column(s) value" mode, but in this mode, the behaviour is actually deterministic and the results will be the same as long as the input data and the Recipe's settings are not changed (so, a seed is not needed to be set for this mode).

    I hope this would help. Please let us know if you have any further questions about this topic.

    Sincerely,
    Keiji

Answers

  • MRvLuijpen
    MRvLuijpen Partner, L2 Admin, L2 Designer, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Frontrunner 2022 Participant, Neuron 2023 Posts: 107 Neuron
    Options

    Hello @KeijiY
    ,

    Thank you for your response. It is clear.

    Would it be an idea to implement the possibility of introducing a seed in there.

    For our business case we would like to set a seed in order to create multiple random, but different, subsets.

    A reference: sklearn groupshufflesplit parameter random_state

    Thanks again.

    Sincerely,

    Marc Robert

Setup Info
    Tags
      Help me…