Problem when using folder.upload_stream() to save files to Dataiku Folder
Hello community,
Right now I'm developing a Dataiku recipe to save parquet format file into Dataiku folder.
First I need to call other service to get a dataframe and transform the dataframe into parquet format. But after running the recipe, the parquet file size is always 0KB with nothing inside. I used folder.upload_stream() provided by Dataiku. I have verified that there is no problem with dataframe
import io f = io.BytesIO() df.to_parquet(f) folder.upload_stream("name of file.parquet", f)
I don't know how to fix this problem, does someone have the same issue?
Thank you for your help
Answers
-
Hi,
The simplest would be to use the alternative form of to_parquet:
data = df.to_parquet() folder.upload_data("name_of_file.parquet", data)
-
Hi,
For this alternative way, we need to specify the path and generate that parquet file in that path. And the method itself returns a None type
df.to_parquet(path="file_name.parquet")
I don't know if will work with dataiku. Even it works it will create extra file somewhere.
Do you know other ways to solve the problem?
Thanks a lot
-
Sorry, I find the answer. It's just a python question instead of dataiku problem
folder.upload_stream("name of file.parquet", f.getvalue())
Thank you for your time