Bug: Ecoding hash values with Base64, SHA1, SHA256, SHA512
Hi all,
not sure if this is a bug, but to me it seems like that:
Following scenario: I tried to encode strings with SHA1 within a preparatoin recipe and found that several outputs got the same hash, even though the input string varied.
I finally got closer to the cause, that in some cases the input value had a notation like: "3151E19012".
Using the toBase64 encoding and decoding (fromBase64) showed the value "Infinity".
I guess that Dataiku interprets the value as integer and not as string and therefore the number 3151E19012 is interpreted as exponential number that is out of range and returns the value Infinity to the function.
Is this a known issue?
Answers
-
Hi,
You would need to force the storage type of your output column as string to avoid that.
-
Hi @Clément_Stenac
,thanks for the quick response.
I enforced that by converting the input with toString() first
Here's how the input looks:
fromBase64(toBase64(toString(CY_MANUFACTURER_REFERENCE)))
The output looks as following:
This should be reproducable.
For the moment I have a workaround with an regular expression that splits the string. But in general the behaviour shouldn't be like that.
-
Hi,
OK, I hadn't understood that you wanted to use it in a formula.
You will need to use strval("CY_MANUFACTURER_REFERENCE") instead of just CY_MANUFACTURER_REFERENCE
This is explained here: https://doc.dataiku.com/dss/latest/advanced/formula.html#variables-typing-and-autotyping
This behavior will not be changed:
- When evaluating the formula, the type of the column is not yet computed, so it must consider each value independently and tries to autotype, unless you use strval()
- Changing it would be a non-acceptable backwards compatibility breakage
-
Thanks, that solution works fine!