Submit your inspiring success story or innovative use case to the 2022 Dataiku Frontrunner Awards! ENTER YOUR SUBMISSION

Upload a file to a partitioned Managed Folder

Thomas_il
Level 2
Upload a file to a partitioned Managed Folder

Hi! 

With a Python recipe, I am trying to upload a CSV to a managed folder, partitioned by year-month-day :

Capture d’écran 2022-05-30 à 18.17.57.png

I have tried everything that I can think of, and each time I get this error : 

Invalid partitioning: in running compute_AAAAAAA_2022-05-30: Partitioning scheme is not representable as folders

 

At the moment, my script looks like this :

# Write recipe outputs
output_folder = dataiku.Folder("AAAAAAAA")
filepath = (datetime.datetime.now() - datetime.timedelta(days = 1)).strftime('%Y-%m-%d') + "/"
filename = 'DAILY_' + (datetime.datetime.now() - datetime.timedelta(days = 1)).strftime('%Y-%m-%d') + '.csv'
df.to_csv(filename, encoding='utf-8')
output_folder.put_file(filepath, filename)

 

Any help would be greatly appreciated, as I've been banging my head against the wall for quite some time now on his 😅

 

Thomas


Operating system used: MacOS

0 Kudos
4 Replies
Ignacio_Toledo

Hello @Thomas_il,

Some questions before trying to help you:

  • Do you have a source dataset as an input to your python recipe, or is a python recipe with only an output folder?
  • Do you want to have one folder for each daily file you upload, or you want all the files to be in the same folder, but following the filename pattern  'DAILY_%Y-%m-%d.csv'?

With that clear, I think I can give you some ideas.

Cheers

0 Kudos
Thomas_il
Level 2
Author

Hello @Ignacio_Toledo

It is indeed a python recipe with no input (I am using Paramiko to pull data from an SFTP server).

Regarding the pattern, both choices are OK for me. The use of the partitioned folder would be to pull the last file from it, using max(folder.list_partitions())

 

Thank you for your time and help!

Thomas

0 Kudos
Ignacio_Toledo

Hi @Thomas_il,

Given your use case, I think you don't need a partitioned folder in principle: if you write all your files with the format 'DAILY_%Y-%m-%d.csv' in the root of the output_folder, you can easily use another python function to get the latest file written:

 

 

output_folder = dataiku.Folder("AAAAAAAA")

import glob
import os

list_of_files = glob.glob(output_folder.get_path() + '/*.csv')
latest_file = max(list_of_files, key=os.path.getctime)
print(latest_file)

 

 

 

I think this is the simplest option. However, learning to use partitioned folders and datasets is useful, so if you are interested in the alternative solution, let me know and I can share another example

Cheers

 

0 Kudos
Thomas_il
Level 2
Author

Hi @Ignacio_Toledo 

Thanks for this solution, works perfectly.

Indeed, I would be very happy to learn how to do this with partitioned folders! I will need those in the future, and knowing how they work would be great.

Again, thanks for your time and help.

 

Thomas

Labels

?
Labels (1)

Setup info

?
A banner prompting to get Dataiku