Access the partition value in Pyspark Recipe
I have a table that is partitioned by date, how can I access the partition date in a pyspark recipe
I tried the following code but does not recognize actual_date
fct_pm_card.select("application_id", "product") \
.filter(col('actual_date') <= end_date)
Answers
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
Hi @torbiks
,
How is your dataset partitions are you using Spark partitions e.g repartition?
https://doc.dataiku.com/dss/latest/spark/datasets.html
DSS partitions and spark partitions are different.So, you can't reference a DSS partition in Spark directly since only that partition will be available when running a DSS partitioned spark job.
If you need the partitioning, you can add back the partition as a column using the prepare recipe processor: https://doc.dataiku.com/dss/latest/preparation/processors/enrich-with-record-context.html
Then you would be able to use that column you pyspark.