No additional error info when splitting / filtering dataset

ebbingcasa
Level 3
No additional error info when splitting / filtering dataset

Hi everyone,

I'm fairly new to Dataiku but the first flows have been working fine. Now, when trying to split a dataset into two new ones via two filters incl. regex, I get an error message without any information. Does anyone have any pointer? Could it for some reason be a size issue (8 columns, 470k rows)

Screenshot attached.

Thanks!

Best,
Peter

PS: It's the same if I delete all filtering conditions and only keep a column equals string condition. It works fine on another dataset with similar data.

4 Replies
ebbingcasa
Level 3
Author

Seems like it happened due to the size of the dataset. Does that make sense?

tgb417

@ebbingcasa 

Welcome to the Dataiku Community.  Great to have you here.

I'm not aware of an overall data size limit.  I've definitely worked with multi-million-row datasets on a laptop computer.   

Are you able to pull the logs from the job you have run?  The screenshot that you have provided has not provided me enough context to tell exactly what you are doing when this error showed up.

There are definitely some limits in place to keep you from using up all of your ram for the sample you are working with.  These can be extended from 100 MB to 500MB in the sample screen hidden on the right side of your dataset.  There are also limits you can adjust under Administration -> Setting accessible. Administration can be accessed from the tick-tack-toe or noughts and Crosses icon on the upper right side of the DSS screen.

--Tom

--Tom
ebbingcasa
Level 3
Author

Thank you, Tom. I can't replicate the issue today, will report if it happens again and the ticket can be closed until then. Maybe I was to impatient and crashed something at some point? One error in the logs states "This job could not be executed because some datasets were already being built."
I don't know. Thank you for your context, everything helps at this point!

0 Kudos
tgb417

@ebbingcasa 

That error suggests that you had another job running building the data you were working on.  That would be why it is working now.  Because that other process has finished.

Glad that things have worked out for you.

--Tom

--Tom