Select files depending on the timestamp without python code
Fazz
Registered Posts: 2 ✭✭✭
I have an Analyst license.
I have Box connection setup from my Dataiku version 9.x.
How can I read the latest file from the box into Dataiku, where the filename does not have any YYYYMMDDHH format.
The file is uploaded by the automation process from a different ETL tool. Therefore, the filename is random every time without any YYYYMMDDHH format, something like below:
Test file Mchskhfkjkj.xlsx
Test file Yhsjdkasdk.xlsx
Test file Uhfkhsdkjh.xlsx
Since I do not have option to write any code. Is there any work around to solve this issue?
Can you please suggest.
Regards,
Fazz
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,000 Neuron
I believe it is not possible to do what you want with an Analyst license. So here are few options:
- The most obvious solution is to have your ETL process use always the same file name so that you will not need to worry about identifying what's the latest filename
- The next easiest solution is to have your ETL process use YYYYMMDDHH in the filename. In this option you could use a Files in Folder dataset. Then use the Enrich with record context processor in a Prepare visual recipe to add the file name. Then you can a Top N visual recipe to identify the latest filename and finally use that in Filter recipe to remove the rows from the older files
- Have your Dataiku Administrator create a simple Python plugin that does what you need. I am not 100% sure if Plugin recipes can be used by Analysts but you can check in your recipe options to see if you have plugins there too