Flagging Records With Invalid Values
I am a new Dataiku user, and I'd like some advice on the best way to leverage the tool to flag records with invalid values.
Example: I have a column called "Fruit," and I expect the following values for that column: 'Apples',' 'Oranges', & 'Grapes'. I want to flag in a different column any records that don't contain one of these values.
Some options that I've considered are:
- Flag invalid rows recipe - this doesn't seem to work because it only checks for data types
- Flag rows on value recipe - this doesn't seem to work since it is matching on a value and doesn't flag things that don't match
- Joins- Setting up the valid values as a separate data source (b) to join against the primary data source (a) and in the output data source (c) adding a flag to any records where b.datasource's key is null and then rejoining (c) back to data source (a)
- Nasty if formula - make a nested if formula that checks the records against all of the values
I'm sure this can be done in python too, but I am not fluent in python. Therefore, I am hoping to find an easy alternative.
Any recommendations based on how others have handled this use case?
Best Answer
-
Hi @AnalyticsAnton
,What you're looking for is achievable using User-defined meanings.
Thanks to a custom meaning, you can define a values list to specify what are the valid values.
Then, you'll be able to use any preparation processor based on invalid rows/cells.
For more details about the difference between the storage types and the meanings, you can refer to the Definitions page of the reference documentation.
Have a great day!