Partition Redispatch S3 parquet dataset using column - how to run optimally?
Hi, I am working on data in S3 which is partitioned by timestamp on filename. I need to repartition the data using a column value as the files contain unordered timestamp data. I tried redispatch partitioning using Sync recipe as mentioned in https://knowledge.dataiku.com/latest/courses/advanced-partitioning/hands-on-file-based-partitioning..... This approach didn't work for me as I'm running out of memory when running redispatch partitioning even for 2 hrs dataset (~3 GB). Is there a way I can run it on Spark or Snowflake?.