Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Using UTF-8 encoding the dataiku parse '\ufeff' as a part of my first column header. After research one possible solution is to use the UTF-8-sig encoding however dataiku does not support it.
Any help would be appreciated!
Hello david1002liu,
As you say, utf-8-sig is not recognized and that’s why you see that warning. What DSS version and the notebook code environment that you are using? I cannot replicate this behaviour with DSS version 9.0.2 (when I read a CSV file with a BOM, it is ignored).
Having said that, do note that according to the Unicode standard, “use of a BOM is neither required nor recommended for UTF-8” (see Section 2.6)- so alternatively you could look into removing it from the CSV file before reading it into DSS.
Best regards,
Juan Eiros Zamora
Technical Support Engineer, Dataiku
Hi,
Alternatively, you can begin your flow with a prepare recipe, and a "Rename column" processor. Rename the first column to be the same name (type it out). This should remove the byte order mark from the output dataset.
Best,
Pat