Run a recipe for all partitions available

Options
Noah
Noah Registered Posts: 33 ✭✭✭✭

When I run a recipe how do I run it for all the partitions of one variable.

In the below photo I would like to run this code recipe for all partitions in RW_Index.

Answers

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,209 Dataiker
    edited July 17
    Options

    Hi,

    Currently DSS expects the explicit list of partitions to build. If you want run a recipe on all partitions you can use scenario with execute python code. Here is one example of how you could accomplish this:

    from dataiku.scenario import Scenario
    import dataiku
    
    scenario = Scenario()
    dataset = dataiku.Dataset("input_dataset_name")
    
    partitions = dataset.list_partitions() # get all partitions from input dataset
    
    # for all available partitions in all dimensions 
    #partitions_str = ','.join(partitions) # concatenate 
    
    #when some dimensions are defined but another dimensions requires ALL include in your example partitions you want will start with '2020Q4|Pricing|L4L_Monthly'
    partitions_str = ','.join([item for item in partitions if item.startswith('2020Q4|Pricing|L4L_Monthly')])
    
    scenario.build_dataset("output_good", partitions=partitions_str)

  • TheMLEngineer
    TheMLEngineer Registered Posts: 25
    Options

    Is there no way to run the spark engine for all partitions using visual recipes?

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,209 Dataiker
    Options

    Hi @TheMLEngineer
    ,

    There is no direct way to build all partitions visually there has been a feature request submitted.

    When redispatch partitions in that case all partitions will be redispatched by default.
    If you have a date partition, you can set a date range to cover all partitions.
    If you have discrete, you can set a variable with all of your partitions and build it that way.

    One of the benefits of partitions is not having to build all every time, when building them the first time, you can indeed use the scenario or other methods mentioned above>

    Kind Regards,

  • TheMLEngineer
    TheMLEngineer Registered Posts: 25
    Options

    Thanks @AlexT
    , this is helpful. I used a pyspark recipe to run the partition in the code.

Setup Info
    Tags
      Help me…