Folder connected to S3 bucket is enumerating all files by default - memory leak

Talb27
Talb27 Dataiku DSS Core Designer, Registered Posts: 3 ✭✭✭

Hi,

I'm having a Folder in my pipeline that is connected to an S3 bucket containing Millions of files.

I noted an odd behaviour while running a python recipe: every time this folder is an input of a recipe, it will enumerate all its files by default (even if I don't create the object in code). Since there are Millions of files, the build will take ages before running into a memory leak!

Anyone ones how to suppress this default behaviour from S3 buckets?

read_bucket_ng.png

Best regards,
Talb27


Operating system used: Windows 10

Tagged:

Best Answer

Answers

Setup Info
    Tags
      Help me…