The Dataiku Frontrunner Awards have launched to recognize your achievements! SUBMIT YOUR ENTRY

The ability to turn off Cell level "Duck Typing" within a column for visual recipies

0 Kudos

User Story:

As a Data Analyst who makes assumptions that are aligned with relational SQL databases column-based typing, I'd like to be able to turn off the cell level "Duck Typing" that can occur within a single column in DSS visual recipes, and establish some sort of consistent column-based typing.  This would allow for more consistent behavior of visual recipe steps within a column as input data changes over time.

COS:

  • Current behavior would be default for backward compatibility.

Nice to Have Items:

  • It would be helpful if this behavior was changeable from one visual recipe / visual recipe step to the next.

Notes:

  • This problem is particularly noticeable when string columns contain some cells with strings that take on the form of numbers and get changed to numbers.   
  • I've opened a support ticket for this situation. [#23180] Inconsistent Behaviour with Regex and Text being converted to Numbers"
  • This might be implemented as a button that appears on each step, picking an assumed column format.  Or it might say that the "storage" type will be the type of data in the column.
  • Here is an example of how this duck typing behavior can be lead to confusion.
    Visual Recipy Text Behaving some time like numbers.jpg

     

    • Notice in the second column how the numbers are being converted in some cases to integers.  Stripping leading 0s.  And in other cases truncating numbers to Exponential formating.   Changing the length of the original data in the process.

    • And then in the last column where a regular expression is being used to find numbers things are working as expected.
    • There also may be a bug in the split recipe where regular expressions on number-looking strings are being changed to integers and not matching regular expressions.
  • Split Recipie.jpgThis shows a regular expression-based split.  However, the strict number values beginning in leading 0s are not being captured.
    In the Wrong Table.jpg

    cc: @AshleyW 
Public