Run a recipe for all partitions available

Noah · April 2021

When I run a recipe how do I run it for all the partitions of one variable.

In the below photo I would like to run this code recipe for all partitions in RW_Index.

Alexandru · April 2021

Hi,

Currently DSS expects the explicit list of partitions to build. If you want run a recipe on all partitions you can use scenario with execute python code. Here is one example of how you could accomplish this:

from dataiku.scenario import Scenario
import dataiku

scenario = Scenario()
dataset = dataiku.Dataset("input_dataset_name")

partitions = dataset.list_partitions() # get all partitions from input dataset

# for all available partitions in all dimensions 
#partitions_str = ','.join(partitions) # concatenate 

#when some dimensions are defined but another dimensions requires ALL include in your example partitions you want will start with '2020Q4|Pricing|L4L_Monthly'
partitions_str = ','.join([item for item in partitions if item.startswith('2020Q4|Pricing|L4L_Monthly')])

scenario.build_dataset("output_good", partitions=partitions_str)

TheMLEngineer · February 2024

Is there no way to run the spark engine for all partitions using visual recipes?

Alexandru · February 2024

Hi @TheMLEngineer
,

There is no direct way to build all partitions visually there has been a feature request submitted.

When redispatch partitions in that case all partitions will be redispatched by default.
If you have a date partition, you can set a date range to cover all partitions.
If you have discrete, you can set a variable with all of your partitions and build it that way.

One of the benefits of partitions is not having to build all every time, when building them the first time, you can indeed use the scenario or other methods mentioned above>

Kind Regards,

TheMLEngineer · February 2024

Thanks @AlexT
, this is helpful. I used a pyspark recipe to run the partition in the code.

Run a recipe for all partitions available

Answers

Categories

Setup Info

Tags