split formula for more than one delimiter

Options
ESoto
ESoto Registered Posts: 15

Hello everyone,

I attempted to use the split function for the delimiter that has the value ",". However, I also need it to be split by the value "()".

When using just the split function, I attempted to type in both the comma as the indicator and the parenthesis as the indicator, but it seems to be limited only to one and I can't have them split separately because the rows and columns have mistakes where the split occurred.

How would I do this in a formula instead?

Answers

  • LouisDHulst
    LouisDHulst Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Neuron, Registered, Neuron 2023 Posts: 44 Neuron
    Options

    Hi @ESoto
    ,

    Looks like you need to use some Regex logic in the split formula. If you want to split on both "," and "()" you can try using split(<your-col-to-split>, /,|\(\)/) .

  • ESoto
    ESoto Registered Posts: 15
    Options

    Thank you so much for the reply @LouisDHulst
    .

    I attempted this but it is saying there is an error, although the syntax is valid. It's saying besides expecting the Regex it needs an optional Boolean.

  • LouisDHulst
    LouisDHulst Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Neuron, Registered, Neuron 2023 Posts: 44 Neuron
    edited July 17
    Options

    According to the doc for split() the function has an additional argument called preserveAllTokens, which you can set to True if you want to keep empty segments. The example given is

    split(""hello"", ""he"", asBool(1)) returns ["""",""llo""]

    This doesn't seem to work with Regex though. If you have any strings that start with "," or "()", you'll get an empty string in the output array:

    split(",ABCDE", /(,|\(\))/, asBool(1)) returns ["","ABCDE"]

    If you want to get rid of the empty string you could use a Replace processor that replaces "", with nothing.

Setup Info
    Tags
      Help me…