Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Using UTF-8 encoding the dataiku parse '\ufeff' as a part of my first column header. After research one possible solution is to use the UTF-8-sig encoding however dataiku does not support it.
Any help would be appreciated!
As you say, utf-8-sig is not recognized and that’s why you see that warning. What DSS version and the notebook code environment that you are using? I cannot replicate this behaviour with DSS version 9.0.2 (when I read a CSV file with a BOM, it is ignored).
Having said that, do note that according to the Unicode standard, “use of a BOM is neither required nor recommended for UTF-8” (see Section 2.6)- so alternatively you could look into removing it from the CSV file before reading it into DSS.
Juan Eiros Zamora
Technical Support Engineer, Dataiku
Alternatively, you can begin your flow with a prepare recipe, and a "Rename column" processor. Rename the first column to be the same name (type it out). This should remove the byte order mark from the output dataset.