Survey banner
The Dataiku Community is moving to a new home! Some short term disruption starting next week: LEARN MORE

Run a recipe for all partitions available

nshapir2
Level 2
Run a recipe for all partitions available

When I run a recipe how do I run it for all the partitions of one variable.

 

In the below photo I would like to run this code recipe for all partitions in RW_Index.

0 Kudos
4 Replies
AlexT
Dataiker

Hi,

Currently DSS expects the explicit list of partitions to build. If you want run a recipe on all partitions you can use scenario with execute python code.  Here is one example of how you could accomplish this:

from dataiku.scenario import Scenario
import dataiku

scenario = Scenario()
dataset = dataiku.Dataset("input_dataset_name")

partitions = dataset.list_partitions() # get all partitions from input dataset

# for all available partitions in all dimensions 
#partitions_str = ','.join(partitions) # concatenate 

#when some dimensions are defined but another dimensions requires ALL include in your example partitions you want will start with '2020Q4|Pricing|L4L_Monthly'
partitions_str = ','.join([item for item in partitions if item.startswith('2020Q4|Pricing|L4L_Monthly')])

scenario.build_dataset("output_good", partitions=partitions_str)

 

TheMLEngineer
Level 2

Is there no way to run the spark engine for all partitions using visual recipes?

0 Kudos
AlexT
Dataiker

Hi @TheMLEngineer ,

There is no direct way to build all partitions visually there has been a feature request submitted. 

When redispatch partitions in that case all partitions will be redispatched by default.
If you have a date partition, you can set a date range to cover all partitions.
If you have discrete, you can set a variable with all of your partitions and build it that way. 

One of the benefits of partitions is not having to build all every time, when building them the first time, you can indeed use the scenario or other methods mentioned above>

Kind Regards,

0 Kudos
TheMLEngineer
Level 2

Thanks @AlexT, this is helpful. I used a pyspark recipe to run the partition in the code.

0 Kudos