Problem when using folder.upload_stream() to save files to Dataiku Folder

Options
Blossom
Blossom Registered Posts: 8 ✭✭✭✭

Hello community,

Right now I'm developing a Dataiku recipe to save parquet format file into Dataiku folder.

First I need to call other service to get a dataframe and transform the dataframe into parquet format. But after running the recipe, the parquet file size is always 0KB with nothing inside. I used folder.upload_stream() provided by Dataiku. I have verified that there is no problem with dataframe

import io
f = io.BytesIO()
df.to_parquet(f)

folder.upload_stream("name of file.parquet", f)

I don't know how to fix this problem, does someone have the same issue?

Thank you for your help

Answers

  • Clément_Stenac
    Clément_Stenac Dataiker, Dataiku DSS Core Designer, Registered Posts: 753 Dataiker
    edited July 17
    Options

    Hi,

    The simplest would be to use the alternative form of to_parquet:

    data = df.to_parquet()
    folder.upload_data("name_of_file.parquet", data)
  • Blossom
    Blossom Registered Posts: 8 ✭✭✭✭
    edited July 17
    Options

    Hi,

    For this alternative way, we need to specify the path and generate that parquet file in that path. And the method itself returns a None type

    df.to_parquet(path="file_name.parquet")

    I don't know if will work with dataiku. Even it works it will create extra file somewhere.

    Do you know other ways to solve the problem?

    Thanks a lot

  • Blossom
    Blossom Registered Posts: 8 ✭✭✭✭
    edited July 17
    Options

    Sorry, I find the answer. It's just a python question instead of dataiku problem

    folder.upload_stream("name of file.parquet", f.getvalue())

    Thank you for your time

Setup Info
    Tags
      Help me…