Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Added on June 6, 2022 9:03AM
Likes: 1
Replies: 3
Hi,
I'm having a Folder in my pipeline that is connected to an S3 bucket containing Millions of files.
I noted an odd behaviour while running a python recipe: every time this folder is an input of a recipe, it will enumerate all its files by default (even if I don't create the object in code). Since there are Millions of files, the build will take ages before running into a memory leak!
Anyone ones how to suppress this default behaviour from S3 buckets?
Best regards,
Talb27
Operating system used: Windows 10
Hi @Talb27
,
To avoid enumerating items in this particular S3 bucket path you can add ignore_flow=True when calling
dataiku.Folder("folder_name", ignore_flow=True ) and don't add the folder as input to the recipe.
If this is the only output then you would need to add a dummy output dataset/folder at least.
https://doc.dataiku.com/dss/latest/python-api/managed_folders.html#dataiku.Folder
Let me know if that works for you.
Thank you so much @AlexT
!
This perfectly solved our problem
Talb27
Thanks for preventing me from crashing our dataiku server