Flagging Records With Invalid Values

Solved!
AnalyticsAnton
Level 1
Flagging Records With Invalid Values

I am a new Dataiku user, and I'd like some advice on the best way to leverage the tool to flag records with invalid values. 

Example: I have a column called "Fruit," and I expect the following values for that column: 'Apples',' 'Oranges', & 'Grapes'. I want to flag in a different column any records that don't contain one of these values. 

Some options that I've considered are:

  1. Flag invalid rows recipe - this doesn't seem to work because it only checks for data types
  2. Flag rows on value recipe - this doesn't seem to work since it is matching on a value and doesn't flag things that don't match
  3. Joins- Setting up the valid values as a separate data source (b) to join against the primary data source (a) and in the output data source (c) adding a flag to any records where b.datasource's key is null and then rejoining (c) back to data source (a)
  4. Nasty if formula - make a nested if formula that checks the records against all of the values

I'm sure this can be done in python too, but I am not fluent in python. Therefore, I am hoping to find an easy alternative. 

Any recommendations based on how others have handled this use case?

0 Kudos
1 Solution
dimitri
Dataiker

Hi @AnalyticsAnton ,

What you're looking for is achievable using User-defined meanings.

configure_meanings.png

 

Thanks to a custom meaning, you can define a values list to specify what are the valid values.

meanings_fruits.png

invalid_fruits.png

 

Then, you'll be able to use any preparation processor based on invalid rows/cells.

meanings_prepare.png

 

For more details about the difference between the storage types and the meanings, you can refer to the Definitions page of the reference documentation.

Have a great day!

View solution in original post

1 Reply
dimitri
Dataiker

Hi @AnalyticsAnton ,

What you're looking for is achievable using User-defined meanings.

configure_meanings.png

 

Thanks to a custom meaning, you can define a values list to specify what are the valid values.

meanings_fruits.png

invalid_fruits.png

 

Then, you'll be able to use any preparation processor based on invalid rows/cells.

meanings_prepare.png

 

For more details about the difference between the storage types and the meanings, you can refer to the Definitions page of the reference documentation.

Have a great day!