Discover this year's submissions to the Dataiku Frontrunner Awards and give kudos to your favorite use cases and success stories!READ MORE

Partitioned Managed Folder S3 Enumerate takes a lot of time

Solved!
NN
Neuron
Neuron
Partitioned Managed Folder S3 Enumerate takes a lot of time

Hi Dataiku Team,
I have an S3 managed folder , which is partitioned at a day level.
my process pushes data into the partition for the particular day.

Now the challenge i face is that the actual code to push data runs in a few seconds ,
but in the log is see that post this dataiku runs  S3 Enumerate process First on the partition sub folder which runs in a few seconds
and then on the main folder and this takes up almost 20 minutes to run.
I have a lot of partitions which keeps on increasing every day. Is there any way i can stop the second computation from running?

Any suggestions are welcome 🙂 

0 Kudos
1 Solution
AlexT
Dataiker
Dataiker

Hi @NN ,

Depending on the actual  recipe running you can disable thi by not including the folder as an input to your python recipe and adding ignore_flow=True in the dataiku.Folder() call

https://doc.dataiku.com/dss/latest/python-api/managed_folders.html#dataiku.Folder

Kind Regards,

 

View solution in original post

0 Kudos
2 Replies
AlexT
Dataiker
Dataiker

Hi @NN ,

Depending on the actual  recipe running you can disable thi by not including the folder as an input to your python recipe and adding ignore_flow=True in the dataiku.Folder() call

https://doc.dataiku.com/dss/latest/python-api/managed_folders.html#dataiku.Folder

Kind Regards,

 

0 Kudos
NN
Neuron
Neuron
Author

ThankYou @AlexT .. This solution did work for now and the process has been running fine.