Community Conundrum 25: Feature Visualization is now live! Read More

In a split recipe, how to split on an string value coming in an interger type column ?

Level 3
In a split recipe, how to split on an string value coming in an interger type column ?

I am trying to split a dataset into two based on an id column (most values in it are numbers) which also have

string values(some names) coming in from the source. I want to split the good data(where id are numbers) into dataset 1 and all invalid id value rows into dataset 2. How can I do this ?

if I am using the split recipe it is automatically taking id column as integer and shows only logical operations in the dropdown (ex: ==, <= e.tc) and it not allowing me to type in String value or Regex match

3 Replies
Level 3
Author

example of data values in id column 

id

123

435

ABC

321

Dataiker
Dataiker

Hi,

you can use the mode where you define filters for the "good" dataset and the "bad" dataset, with filters defined as formulas. For example, with an id column named "a", this gives:

Screenshot 2020-05-06 at 09.35.26.png

 

Regards,

    Frederic

Level 3
Author

Thank you Frederic for quick reply, is there a way to implement Regex to implement that filter?

Because in case if there are any kind of non-numeric values coming into that column(ex: alphanumeric, only string values , spaces, nulls e.t.c) I want to filter all such bad data out as bad data in one shot and all numeric values (ex; Regex ^/d+$ ) would go to the good dataset