Access the partition value in Pyspark Recipe

torbiks · ‎09-25-2023

I have a table that is partitioned by date, how can I access the partition date in a pyspark recipe

I tried the following code but does not recognize actual_date

fct_pm_card.select("application_id", "product") \
.filter(col('actual_date') <= end_date)

AlexT · ‎09-26-2023

Hi @torbiks ,

How is your dataset partitions are you using Spark partitions e.g repartition?

https://doc.dataiku.com/dss/latest/spark/datasets.html

DSS partitions and spark partitions are different.

So, you can't reference a DSS partition in Spark directly since only that partition will be available when running a DSS partitioned spark job.

If you need the partitioning, you can add back the partition as a column using the prepare recipe processor: https://doc.dataiku.com/dss/latest/preparation/processors/enrich-with-record-context.html
Then you would be able to use that column you pyspark.

Sign up to take part

Access the partition value in Pyspark Recipe

Access the partition value in Pyspark Recipe

Labels