The ability to turn off Cell level "Duck Typing" within a column for visual recipies
tgb417
Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron
User Story:
As a Data Analyst who makes assumptions that are aligned with relational SQL databases column-based typing, I'd like to be able to turn off the cell level "Duck Typing" that can occur within a single column in DSS visual recipes, and establish some sort of consistent column-based typing. This would allow for more consistent behavior of visual recipe steps within a column as input data changes over time.
COS:
- Current behavior would be default for backward compatibility.
Nice to Have Items:
- It would be helpful if this behavior was changeable from one visual recipe / visual recipe step to the next.
Notes:
- This problem is particularly noticeable when string columns contain some cells with strings that take on the form of numbers and get changed to numbers.
- I've opened a support ticket for this situation. [#23180] Inconsistent Behaviour with Regex and Text being converted to Numbers"
- This might be implemented as a button that appears on each step, picking an assumed column format. Or it might say that the "storage" type will be the type of data in the column.
- Here is an example of how this duck typing behavior can be lead to confusion.
Notice in the second column how the numbers are being converted in some cases to integers. Stripping leading 0s. And in other cases truncating numbers to Exponential formating. Changing the length of the original data in the process.
- And then in the last column where a regular expression is being used to find numbers things are working as expected.
- There also may be a bug in the split recipe where regular expressions on number-looking strings are being changed to integers and not matching regular expressions.
- This shows a regular expression-based split. However, the strict number values beginning in leading 0s are not being captured.
cc: @AshleyW
Tagged: