How to know if a recipe has partitioned dataset using APIs?
nmadhu20
Neuron, Registered, Neuron 2022, Neuron 2023 Posts: 35 Neuron
Hi Team,
We have been trying to build a code where we need to identify recipes which has an output partitioned dataset. We tried couple of approaches to achieve this:
- Check for the 'map' key but this is not reliable as we have identified instances where this key is present even if output is not partitioned.
code used - setting = recipe.get_settings()
setting.get_recipe_raw_definition() - Checking for parittions in a dataset then extracting it's parent recipe
dt_info = project.get_dataset('dataset_name')
if 'NP' not in dt_info.list_partitions()[0]:
partition = True
Drawback with 2nd approach is, list_paritions() is taking too long a time to execute.
Could you please help us identify if there is any other way or direct API/identifier in recipe details that tells us whether it has a partitioned dataset or not?
Tagged:
Best Answer
-
Sarina Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 317 Dataiker
Hi @nmadhu20
,
I think that a modification using parts of the first and second approach will work, where you get the partitioning field from the dataset settings to determine if the dataset is partitioned.
Here is an example:for dataset in project.list_datasets(as_type='object'): settings = dataset.get_settings() raw_settings = settings.get_raw() # get the 'partitioning' field and check that length of dimensions is > 0 if 'partitioning' in raw_settings: if len(raw_settings['partitioning']['dimensions']) > 0: # check if this is an output dataset to a recipe dataset_usages = dataset.get_usages() for usage in dataset_usages: if usage['type'] == 'RECIPE_OUTPUT': print('dataset: ' , dataset.name, 'recipe: ', usage['objectId'])
Let me know if you have any questions about this approach.
Thanks,
Sarina
Answers
-
Yes, thankyou this worked for me.