Discover this year's submissions to the Dataiku Frontrunner Awards and give kudos to your favorite use cases and success stories!READ MORE

How to know if a recipe has partitioned dataset using APIs?

Solved!
nmadhu20
Neuron
Neuron
How to know if a recipe has partitioned dataset using APIs?

Hi Team,

We have been trying to build a code where we need to identify recipes which has an output partitioned dataset. We tried couple of approaches to achieve this:

  1. Check for the 'map' key but this is not reliable as we have identified instances where this key is present even if output is not partitioned.
    code used - setting = recipe.get_settings()
                        setting.get_recipe_raw_definition()
  2. Checking for parittions in a dataset then extracting it's parent recipe
    dt_info = project.get_dataset('dataset_name')
    if 'NP' not in dt_info.list_partitions()[0]:
           partition = True

Drawback with 2nd approach is, list_paritions() is taking too long a time to execute.

Could you please help us identify if there is any other way or direct API/identifier in recipe details that tells us whether it has a partitioned dataset or not?

0 Kudos
1 Solution
SarinaS
Dataiker
Dataiker

Hi @nmadhu20,

I think that a modification using parts of the first and second approach will work, where you get the partitioning field from the dataset settings to determine if the dataset is partitioned.  

Here is an example:

for dataset in project.list_datasets(as_type='object'):
    settings = dataset.get_settings()
    raw_settings = settings.get_raw()
    # get the 'partitioning' field and check that length of dimensions is > 0  
    if 'partitioning' in raw_settings:
        if len(raw_settings['partitioning']['dimensions']) > 0: 
            # check if this is an output dataset to a recipe 
            dataset_usages = dataset.get_usages()
            for usage in dataset_usages: 
                if usage['type'] == 'RECIPE_OUTPUT':
                    print('dataset: ' , dataset.name, 'recipe: ', usage['objectId'])


Let me know if you have any questions about this approach. 

Thanks,
Sarina

View solution in original post

0 Kudos
2 Replies
SarinaS
Dataiker
Dataiker

Hi @nmadhu20,

I think that a modification using parts of the first and second approach will work, where you get the partitioning field from the dataset settings to determine if the dataset is partitioned.  

Here is an example:

for dataset in project.list_datasets(as_type='object'):
    settings = dataset.get_settings()
    raw_settings = settings.get_raw()
    # get the 'partitioning' field and check that length of dimensions is > 0  
    if 'partitioning' in raw_settings:
        if len(raw_settings['partitioning']['dimensions']) > 0: 
            # check if this is an output dataset to a recipe 
            dataset_usages = dataset.get_usages()
            for usage in dataset_usages: 
                if usage['type'] == 'RECIPE_OUTPUT':
                    print('dataset: ' , dataset.name, 'recipe: ', usage['objectId'])


Let me know if you have any questions about this approach. 

Thanks,
Sarina

0 Kudos
nmadhu20
Neuron
Neuron
Author

Yes, thankyou this worked for me.