Creating FTP Datasets

Options
alain008
alain008 Dataiku DSS Core Designer, Registered Posts: 1

Hello Dataiku Community.

I am connecting to an FTP folder to build a dataset.

The data I got is the union of all the data in all the files in that directory.

What I want is to process only the new files coming to that directory. I was looking at the advanced options but I couldn't find any documentation about the expression that could be used to do that.

I appreciate your help.

Thank you

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,737 Neuron
    Options

    In terms of only loading new files there are no built-in ways of doing this so you will have to develop custom Python code. Ideally you would want to have subfolders within your folder and move files as you process them. For instance files arrive in ./landing then they get moved to ./processing for loading and then after successful load you move them to ./loaded. That way you always know what files you processed and if it fails loading a file you have right there in the ./processing folder to fix the code or the file and try again the same file. This post will give you some guidance but you will need Python skills to achieve this requirement.

Setup Info
    Tags
      Help me…