Upload a file to a partitioned Managed Folder
Hi!
With a Python recipe, I am trying to upload a CSV to a managed folder, partitioned by year-month-day :
I have tried everything that I can think of, and each time I get this error :
Invalid partitioning: in running compute_AAAAAAA_2022-05-30: Partitioning scheme is not representable as folders
At the moment, my script looks like this :
# Write recipe outputs output_folder = dataiku.Folder("AAAAAAAA") filepath = (datetime.datetime.now() - datetime.timedelta(days = 1)).strftime('%Y-%m-%d') + "/" filename = 'DAILY_' + (datetime.datetime.now() - datetime.timedelta(days = 1)).strftime('%Y-%m-%d') + '.csv' df.to_csv(filename, encoding='utf-8') output_folder.put_file(filepath, filename)
Any help would be greatly appreciated, as I've been banging my head against the wall for quite some time now on his
Thomas
Operating system used: MacOS
Answers
-
Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron
Hello @Thomas_il
,Some questions before trying to help you:
- Do you have a source dataset as an input to your python recipe, or is a python recipe with only an output folder?
- Do you want to have one folder for each daily file you upload, or you want all the files to be in the same folder, but following the filename pattern 'DAILY_%Y-%m-%d.csv'?
With that clear, I think I can give you some ideas.
Cheers
-
Hello @Ignacio_Toledo
It is indeed a python recipe with no input (I am using Paramiko to pull data from an SFTP server).
Regarding the pattern, both choices are OK for me. The use of the partitioned folder would be to pull the last file from it, using max(folder.list_partitions())
Thank you for your time and help!
Thomas
-
Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron
Hi @Thomas_il
,Given your use case, I think you don't need a partitioned folder in principle: if you write all your files with the format 'DAILY_%Y-%m-%d.csv' in the root of the output_folder, you can easily use another python function to get the latest file written:
output_folder = dataiku.Folder("AAAAAAAA") import glob import os list_of_files = glob.glob(output_folder.get_path() + '/*.csv') latest_file = max(list_of_files, key=os.path.getctime) print(latest_file)
I think this is the simplest option. However, learning to use partitioned folders and datasets is useful, so if you are interested in the alternative solution, let me know and I can share another example
Cheers
-
Thanks for this solution, works perfectly.
Indeed, I would be very happy to learn how to do this with partitioned folders! I will need those in the future, and knowing how they work would be great.
Again, thanks for your time and help.
Thomas