Save pandas dataframe to .csv in managed S3 folder

Highlighted
osk
Level 1
Save pandas dataframe to .csv in managed S3 folder

 


Hi Dataiku-Team, 



I have a quick question related to managed S3 folders. I have a dataframe which I want to save as a .csv file in a managed S3 folder. 



Reading the documentation, it sounds to me that I have to store the .csv file in local folder on the DSS server, and then have to upload it like this:




handle = dataiku.Folder("FolderName")
handle.upload_file(file_path="local_path_to_file", path=path_upload_file)


It works like this, however, I feel that there must be a better way of doing it. 



So my question is, is there a way to write a dataframe directly to a managed S3 folder?



Thanks a lot for your help!



Best, 



Oliver



 

0 Kudos
2 Replies
Nicolas_Servel Dataiker
Dataiker
Re: Save pandas dataframe to .csv in managed S3 folder

Hello Oliver,



The Folder API also allows you to retrieve directly a writer, that enables you to to write incrementally to a specific path in the managed folder.



This writer can then be passed directly to pandas to save the dataframe.



It will then save directly the dataframe to S3 if your managed folder is S3-based.



In your case, the code would look like:



 




handle = dataiku.Folder("FolderName")
path_upload_file = "path/in/folder/s3"
with handle.get_writer(path_upload_file) as writer:
your_df.to_csv(writer, ...)
# where ... is replaced by the other params you want for "to_csv"


 



Regards,



Nicolas Servel

0 Kudos
osk
Level 1
Re: Save pandas dataframe to .csv in managed S3 folder
Hi Nicolas,

Thanks a lot for your help!

Best,
Oliver
0 Kudos
Labels (2)