UnicodeDecodeError while uploading xlsx in S3 manage folder
Hi Dataiku-Team,
I am getting issues for uploading files as output from flow in manage folder.
Table_Result.to_excel("Table_Result.xlsx") results_folder.upload_file("Table_Result.xlsx", file_path="Table_Result.xlsx")
I got this error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x82 in position 11: invalid start byte
So my guess was that i had to encode my file to match the decoding format
Table_Result.to_excel("Table_Result.xlsx", encoding="UTF-8") results_folder.upload_file("Table_Result.xlsx", file_path="Table_Result.xlsx")
But I got this error again:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 18: invalid start byte
I don't understand how an UTF-8 encoded file cannot be decode with UTF-8 codec.
So, is there any means to manage this error, or do I have to change my file format to make sure that my file is compliant with with manage folder?
Best Answer
-
I found the solution here : Save pandas dataframe to .xlsx in managed S3 folder
I had to create a ByteIO buffer to write my DataFrame in, then upload its value as stream in folder.
buf = io.BytesIO() Table_Result.to_excel(buf) results_folder.upload_stream('Table_Result.xlsx', buf.getvalue()) buf.close()
Answers
-
Hi @Maxime_GM
save your data frame into a variable and upload that.
df = Table_Result.to_excel()
results_folder.upload_data("Table_Result.xlsx", df)
-
Actually the method to_excel from pandas doesn't return anything, but I found another way to get its value.