Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
I have a table that is partitioned by date, how can I access the partition date in a pyspark recipe
I tried the following code but does not recognize actual_date
fct_pm_card.select("application_id", "product") \
.filter(col('actual_date') <= end_date)
Hi @torbiks ,
How is your dataset partitions are you using Spark partitions e.g repartition?
https://doc.dataiku.com/dss/latest/spark/datasets.html
DSS partitions and spark partitions are different.
So, you can't reference a DSS partition in Spark directly since only that partition will be available when running a DSS partitioned spark job.
If you need the partitioning, you can add back the partition as a column using the prepare recipe processor: https://doc.dataiku.com/dss/latest/preparation/processors/enrich-with-record-context.html
Then you would be able to use that column you pyspark.