How to know if a recipe has partitioned dataset using APIs?

Options
nmadhu20
nmadhu20 Neuron, Registered, Neuron 2022, Neuron 2023 Posts: 35 Neuron

Hi Team,

We have been trying to build a code where we need to identify recipes which has an output partitioned dataset. We tried couple of approaches to achieve this:

  1. Check for the 'map' key but this is not reliable as we have identified instances where this key is present even if output is not partitioned.
    code used - setting = recipe.get_settings()
    setting.get_recipe_raw_definition()
  2. Checking for parittions in a dataset then extracting it's parent recipe
    dt_info = project.get_dataset('dataset_name')
    if 'NP' not in dt_info.list_partitions()[0]:
    partition = True

Drawback with 2nd approach is, list_paritions() is taking too long a time to execute.

Could you please help us identify if there is any other way or direct API/identifier in recipe details that tells us whether it has a partitioned dataset or not?

Tagged:

Best Answer

  • Sarina
    Sarina Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer Posts: 315 Dataiker
    edited July 17 Answer ✓
    Options

    Hi @nmadhu20
    ,

    I think that a modification using parts of the first and second approach will work, where you get the partitioning field from the dataset settings to determine if the dataset is partitioned.

    Here is an example:

    for dataset in project.list_datasets(as_type='object'):
        settings = dataset.get_settings()
        raw_settings = settings.get_raw()
        # get the 'partitioning' field and check that length of dimensions is > 0  
        if 'partitioning' in raw_settings:
            if len(raw_settings['partitioning']['dimensions']) > 0: 
                # check if this is an output dataset to a recipe 
                dataset_usages = dataset.get_usages()
                for usage in dataset_usages: 
                    if usage['type'] == 'RECIPE_OUTPUT':
                        print('dataset: ' , dataset.name, 'recipe: ', usage['objectId'])


    Let me know if you have any questions about this approach.

    Thanks,
    Sarina

Answers

Setup Info
    Tags
      Help me…