Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
I have an input dataset which is partitioned (5 partitions) and i would like to use it all in Pyspark/SparkSQL, now all these partitions are used for grouping to get the overall count in Pyspark/SparkSQL then i would need a specific partition (out of these five) to report the partition along with overall count.
can anyone please help if there is any way to refer this specific partition as a Column (may be a partition identifier) from the code itself?.
while connecting with recipes it generally ask us the list of partitions to be used for input, but here i would need it to input all and use any one partition from all, which can also benefit in performance/efficiency as i am considering the partitioning method.