split formula for more than one delimiter
Hello everyone,
I attempted to use the split function for the delimiter that has the value ",". However, I also need it to be split by the value "()".
When using just the split function, I attempted to type in both the comma as the indicator and the parenthesis as the indicator, but it seems to be limited only to one and I can't have them split separately because the rows and columns have mistakes where the split occurred.
How would I do this in a formula instead?
Answers
-
LouisDHulst Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Neuron, Registered, Neuron 2023 Posts: 54 Neuron
Hi @ESoto
,Looks like you need to use some Regex logic in the split formula. If you want to split on both "," and "()" you can try using split(<your-col-to-split>, /,|\(\)/) .
-
Thank you so much for the reply @LouisDHulst
.I attempted this but it is saying there is an error, although the syntax is valid. It's saying besides expecting the Regex it needs an optional Boolean.
-
LouisDHulst Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Neuron, Registered, Neuron 2023 Posts: 54 Neuron
According to the doc for split() the function has an additional argument called preserveAllTokens, which you can set to True if you want to keep empty segments. The example given is
split(""hello"", ""he"", asBool(1)) returns ["""",""llo""]
This doesn't seem to work with Regex though. If you have any strings that start with "," or "()", you'll get an empty string in the output array:
split(",ABCDE", /(,|\(\))/, asBool(1)) returns ["","ABCDE"]
If you want to get rid of the empty string you could use a Replace processor that replaces "", with nothing.