Check out the first Dataiku 8 Deep Dive focusing on Productivity on October 29th Read More

UnicodeDecodeError while uploading xlsx in S3 manage folder

Level 2
UnicodeDecodeError while uploading xlsx in S3 manage folder

Hi Dataiku-Team,

I am getting issues for uploading files as output from flow in manage folder.

Table_Result.to_excel("Table_Result.xlsx")
results_folder.upload_file("Table_Result.xlsx", file_path="Table_Result.xlsx")

I got this error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x82 in position 11: invalid start byte

 

So my guess was that i had to encode my file to match the decoding format

Table_Result.to_excel("Table_Result.xlsx", encoding="UTF-8")
results_folder.upload_file("Table_Result.xlsx", file_path="Table_Result.xlsx")

But I got this error again:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 18: invalid start byte

I don't understand how an UTF-8 encoded file cannot be decode with UTF-8 codec.

So, is there any means to manage this error, or do I have to change my file format to make sure that my file is compliant with with manage folder?

0 Kudos
3 Replies
Dataiker
Dataiker

Hi @Maxime_GM 

save your data frame into a variable and upload that.

df = Table_Result.to_excel()

 

results_folder.upload_data("Table_Result.xlsx", df)

 

0 Kudos
Level 2
Author

Actually the method to_excel from pandas doesn't return anything, but I found another way to get its value.

0 Kudos
Level 2
Author

I found the solution here : Save pandas dataframe to .xlsx in managed S3 folder 

I had to create a ByteIO buffer to write my DataFrame in, then upload its value as stream in folder.

buf = io.BytesIO()
Table_Result.to_excel(buf)

results_folder.upload_stream('Table_Result.xlsx', buf.getvalue())
buf.close()
0 Kudos