We're excited to announce that we're launching the second installment of Dataiku Product Days Register Now

BOM of a csv detected as parts of the first columns' header

david1002liu
Level 1
BOM of a csv detected as parts of the first columns' header

Using UTF-8 encoding the dataiku parse '\ufeff' as a part of my first column header. After research one possible solution is to use the UTF-8-sig encoding however dataiku does not support it.

Screenshot 2021-08-17 130131.png

Screenshot 2021-08-17 130222.png

Any help would be appreciated!

0 Kudos
1 Reply
JuanE
Dataiker
Dataiker

Hello david1002liu,

As you say, utf-8-sig is not recognized and that’s why you see that warning. What DSS version and the notebook code environment that you are using? I cannot replicate this behaviour with DSS version 9.0.2 (when I read a CSV file with a BOM, it is ignored).

Having said that, do note that according to the Unicode standard, “use of a BOM is neither required nor recommended for UTF-8” (see Section 2.6)- so alternatively you could look into removing it from the CSV file before reading it into DSS.

Best regards,

 

Juan Eiros Zamora

Technical Support Engineer, Dataiku

 

0 Kudos

Labels

?
A banner prompting to get Dataiku DSS