Folder connected to S3 bucket is enumerating all files by default - memory leak

Talb27
Talb27 Dataiku DSS Core Designer, Registered Posts: 3 ✭✭✭

Hi,

I'm having a Folder in my pipeline that is connected to an S3 bucket containing Millions of files.

I noted an odd behaviour while running a python recipe: every time this folder is an input of a recipe, it will enumerate all its files by default (even if I don't create the object in code). Since there are Millions of files, the build will take ages before running into a memory leak!

Anyone ones how to suppress this default behaviour from S3 buckets?

read_bucket_ng.png

Best regards,
Talb27


Operating system used: Windows 10

Tagged:

Best Answer

Answers

  • Talb27
    Talb27 Dataiku DSS Core Designer, Registered Posts: 3 ✭✭✭

    Thank you so much @AlexT
    !
    This perfectly solved our problem Best regards,

    Talb27

  • Tanguy
    Tanguy Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer, Neuron 2023, Circle Member Posts: 141 Neuron

    Thanks for preventing me from crashing our dataiku server

Setup Info
    Tags
      Help me…