Select files depending on the timestamp without python code

Fazz Registered Posts: 2 ✭✭✭

I have an Analyst license.

I have Box connection setup from my Dataiku version 9.x.

How can I read the latest file from the box into Dataiku, where the filename does not have any YYYYMMDDHH format.

The file is uploaded by the automation process from a different ETL tool. Therefore, the filename is random every time without any YYYYMMDDHH format, something like below:

Test file Mchskhfkjkj.xlsx

Test file Yhsjdkasdk.xlsx

Test file Uhfkhsdkjh.xlsx

Since I do not have option to write any code. Is there any work around to solve this issue?

Can you please suggest.




  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,717 Neuron

    I believe it is not possible to do what you want with an Analyst license. So here are few options:

    1. The most obvious solution is to have your ETL process use always the same file name so that you will not need to worry about identifying what's the latest filename
    2. The next easiest solution is to have your ETL process use YYYYMMDDHH in the filename. In this option you could use a Files in Folder dataset. Then use the Enrich with record context processor in a Prepare visual recipe to add the file name. Then you can a Top N visual recipe to identify the latest filename and finally use that in Filter recipe to remove the rows from the older files
    3. Have your Dataiku Administrator create a simple Python plugin that does what you need. I am not 100% sure if Plugin recipes can be used by Analysts but you can check in your recipe options to see if you have plugins there too

Setup Info
      Help me…