Access the partition value in Pyspark Recipe

torbiks
Level 1
Access the partition value in Pyspark Recipe

I have a table that is partitioned by date, how can I access the partition date in a pyspark recipe

 
 

I tried the following code but does not recognize actual_date

fct_pm_card.select("application_id", "product") \
                    .filter(col('actual_date') <= end_date)

 

 
0 Kudos
1 Reply
AlexT
Dataiker

Hi @torbiks ,

How is your dataset partitions are you using Spark partitions  e.g repartition?

https://doc.dataiku.com/dss/latest/spark/datasets.html

DSS partitions and spark partitions are different.

So, you can't reference a DSS partition in Spark directly since only that partition will be available when running a DSS partitioned spark job.

If you need the partitioning, you can add back the partition as a column using the prepare recipe processor: https://doc.dataiku.com/dss/latest/preparation/processors/enrich-with-record-context.html
Then you would be able to use that column you pyspark. 


0 Kudos

Labels

?
Labels (1)
A banner prompting to get Dataiku