No additional error info when splitting / filtering dataset

ebbingcasa
ebbingcasa Registered Posts: 24 ✭✭✭✭✭

Hi everyone,

I'm fairly new to Dataiku but the first flows have been working fine. Now, when trying to split a dataset into two new ones via two filters incl. regex, I get an error message without any information. Does anyone have any pointer? Could it for some reason be a size issue (8 columns, 470k rows)

Screenshot attached.

Thanks!

Best,
Peter

PS: It's the same if I delete all filtering conditions and only keep a column equals string condition. It works fine on another dataset with similar data.

Answers

  • ebbingcasa
    ebbingcasa Registered Posts: 24 ✭✭✭✭✭

    Seems like it happened due to the size of the dataset. Does that make sense?

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron

    @ebbingcasa

    Welcome to the Dataiku Community. Great to have you here.

    I'm not aware of an overall data size limit. I've definitely worked with multi-million-row datasets on a laptop computer.

    Are you able to pull the logs from the job you have run? The screenshot that you have provided has not provided me enough context to tell exactly what you are doing when this error showed up.

    There are definitely some limits in place to keep you from using up all of your ram for the sample you are working with. These can be extended from 100 MB to 500MB in the sample screen hidden on the right side of your dataset. There are also limits you can adjust under Administration -> Setting accessible. Administration can be accessed from the tick-tack-toe or noughts and Crosses icon on the upper right side of the DSS screen.

    --Tom

  • ebbingcasa
    ebbingcasa Registered Posts: 24 ✭✭✭✭✭

    Thank you, Tom. I can't replicate the issue today, will report if it happens again and the ticket can be closed until then. Maybe I was to impatient and crashed something at some point? One error in the logs states "This job could not be executed because some datasets were already being built."
    I don't know. Thank you for your context, everything helps at this point!

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron

    @ebbingcasa

    That error suggests that you had another job running building the data you were working on. That would be why it is working now. Because that other process has finished.

    Glad that things have worked out for you.

    --Tom

Setup Info
    Tags
      Help me…