S3 path Details whenever a new file is received
Hi Team,
I am creating a process where i have to process s3 files, so is there any way to get the complete s3 path whenever a new file is added.
We can create a managed folder which will have the s3 files and in Scenario we can also trigger the process on Managed folder change but can we get the file path details?
Is there any way to do this?
Thanks in Advance
Answers
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
Hi @sj0071992
,If I understand correctly you are looking for the scenario trigger to provide the path of the file that triggered the scenario run?
The full path of the file or files that changed since it last triggered is not available to the scenario directly given how this is currently tracked see the explanation here.
Even if this information was available directly how would you use this?
The parameters/variable within the scenario can be retrieved like so:
from dataiku.scenario import Scenario s = Scenario() trigger_params = s.get_all_variables() print(trigger_params)
This will contain the project/managed folder ID that was modified but not which file/files/folders within were changed.
'scenarioTriggerParam_modified': '["TESTING_S3_NEW.livm8UXE.NP"]'
To actually get the full paths since the last trigger you could save a project variable every time with the timestamp(e.g epoch) for example of the last file that was processed and then compare that with the files that were created after using the modified date that and retrieve their full path.
Here is a code snippet that would retrieve the information you are looking for which you can adapt to your need and use in the Scenario Python step.
import dataiku from dataiku import pandasutils as pdu import pandas as pd import time folder_id = "G8glecnp" input_folder = dataiku.Folder(folder_id) current_epoch = int(time.time())*1000 for item in input_folder.get_path_details()["children"]: print(item) print(item['lastModified']) print(current_epoch)
Hope this helps!
-
degananda264 Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 5 ✭
Hi @AlexT
How to keep monitoring the onedrive folder?
whenever uploaded the file in a onedrive folder, i want to read the file in a python receipe and do some data engineering.
Could you please help me on this ?
Thanks in advance
Degananda