Partitioned Managed Folder S3 Enumerate takes a lot of time
Hi Dataiku Team,
I have an S3 managed folder , which is partitioned at a day level.
my process pushes data into the partition for the particular day.
Now the challenge i face is that the actual code to push data runs in a few seconds ,
but in the log is see that post this dataiku runs S3 Enumerate process First on the partition sub folder which runs in a few seconds
and then on the main folder and this takes up almost 20 minutes to run.
I have a lot of partitions which keeps on increasing every day. Is there any way i can stop the second computation from running?
Any suggestions are welcome
Best Answer
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
Hi @NN
,Depending on the actual recipe running you can disable thi by not including the folder as an input to your python recipe and adding ignore_flow=True in the dataiku.Folder() call
https://doc.dataiku.com/dss/latest/python-api/managed_folders.html#dataiku.Folder
Kind Regards,
Answers
-
ThankYou @AlexT
.. This solution did work for now and the process has been running fine.