In a split recipe, how to split on an string value coming in an interger type column ?

Options
rsingamsetty
rsingamsetty Dataiku DSS Core Designer, Dataiku DSS & SQL, Registered Posts: 18 ✭✭✭✭✭

I am trying to split a dataset into two based on an id column (most values in it are numbers) which also have

string values(some names) coming in from the source. I want to split the good data(where id are numbers) into dataset 1 and all invalid id value rows into dataset 2. How can I do this ?

if I am using the split recipe it is automatically taking id column as integer and shows only logical operations in the dropdown (ex: ==, <= e.tc) and it not allowing me to type in String value or Regex match

Answers

  • rsingamsetty
    rsingamsetty Dataiku DSS Core Designer, Dataiku DSS & SQL, Registered Posts: 18 ✭✭✭✭✭
    Options

    example of data values in id column

    id

    123

    435

    ABC

    321

  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker
    Options

    Hi,

    you can use the mode where you define filters for the "good" dataset and the "bad" dataset, with filters defined as formulas. For example, with an id column named "a", this gives:

    Screenshot 2020-05-06 at 09.35.26.png

    Regards,

    Frederic

  • rsingamsetty
    rsingamsetty Dataiku DSS Core Designer, Dataiku DSS & SQL, Registered Posts: 18 ✭✭✭✭✭
    Options

    Thank you Frederic for quick reply, is there a way to implement Regex to implement that filter?

    Because in case if there are any kind of non-numeric values coming into that column(ex: alphanumeric, only string values , spaces, nulls e.t.c) I want to filter all such bad data out as bad data in one shot and all numeric values (ex; Regex ^/d+$ ) would go to the good dataset

Setup Info
    Tags
      Help me…