Discover this year's submissions to the Dataiku Frontrunner Awards and give kudos to your favorite use cases and success stories!READ MORE

UnicodeDecodeError while uploading xlsx in S3 manage folder

Solved!
Maxime_GM
Level 2
UnicodeDecodeError while uploading xlsx in S3 manage folder

Hi Dataiku-Team,

I am getting issues for uploading files as output from flow in manage folder.

Table_Result.to_excel("Table_Result.xlsx")
results_folder.upload_file("Table_Result.xlsx", file_path="Table_Result.xlsx")

I got this error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x82 in position 11: invalid start byte

 

So my guess was that i had to encode my file to match the decoding format

Table_Result.to_excel("Table_Result.xlsx", encoding="UTF-8")
results_folder.upload_file("Table_Result.xlsx", file_path="Table_Result.xlsx")

But I got this error again:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 18: invalid start byte

I don't understand how an UTF-8 encoded file cannot be decode with UTF-8 codec.

So, is there any means to manage this error, or do I have to change my file format to make sure that my file is compliant with with manage folder?

0 Kudos
1 Solution
Maxime_GM
Level 2
Author

I found the solution here : Save pandas dataframe to .xlsx in managed S3 folder 

I had to create a ByteIO buffer to write my DataFrame in, then upload its value as stream in folder.

buf = io.BytesIO()
Table_Result.to_excel(buf)

results_folder.upload_stream('Table_Result.xlsx', buf.getvalue())
buf.close()

View solution in original post

0 Kudos
3 Replies
Liev
Dataiker Alumni

Hi @Maxime_GM 

save your data frame into a variable and upload that.

df = Table_Result.to_excel()

 

results_folder.upload_data("Table_Result.xlsx", df)

 

0 Kudos
Maxime_GM
Level 2
Author

Actually the method to_excel from pandas doesn't return anything, but I found another way to get its value.

0 Kudos
Maxime_GM
Level 2
Author

I found the solution here : Save pandas dataframe to .xlsx in managed S3 folder 

I had to create a ByteIO buffer to write my DataFrame in, then upload its value as stream in folder.

buf = io.BytesIO()
Table_Result.to_excel(buf)

results_folder.upload_stream('Table_Result.xlsx', buf.getvalue())
buf.close()
0 Kudos