Flagging Records With Invalid Values

AnalyticsAnton Registered Posts: 2 ✭✭✭✭

I am a new Dataiku user, and I'd like some advice on the best way to leverage the tool to flag records with invalid values.

Example: I have a column called "Fruit," and I expect the following values for that column: 'Apples',' 'Oranges', & 'Grapes'. I want to flag in a different column any records that don't contain one of these values.

Some options that I've considered are:

  1. Flag invalid rows recipe - this doesn't seem to work because it only checks for data types
  2. Flag rows on value recipe - this doesn't seem to work since it is matching on a value and doesn't flag things that don't match
  3. Joins- Setting up the valid values as a separate data source (b) to join against the primary data source (a) and in the output data source (c) adding a flag to any records where b.datasource's key is null and then rejoining (c) back to data source (a)
  4. Nasty if formula - make a nested if formula that checks the records against all of the values

I'm sure this can be done in python too, but I am not fluent in python. Therefore, I am hoping to find an easy alternative.

Any recommendations based on how others have handled this use case?

Best Answer

Setup Info
      Help me…