Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hello dear community,
I have been experimenting with both @partitioning and @chunking datasets, and now have a problem.
My input dataset is partitioned, but the partitions still contain a lot of records, thus I want to process the partitions with @python in chunks.
In the documentation on https://doc.dataiku.com/dss/latest/python-api/datasets.html I did find how I can write the chunks and also adjusted this a bit, so the first time the schema was written. However, this does not seem to work correctly in combination with partitioning, since then the script is run 5 times in parallel.
Thanks in advance...
Hi,
I don't think we understand what the exact issue is. What is the warning you encountered ? Note that you can prevent parallel executions of the recipe in partitioning mode, by setting the parallelism limit in the Advanced settings of the recipe.
The error message was: "table already exists but with an incompatible schema:"
In the run with 72 partitions, 2 of these failed, while 70 ran without errors (see screen shot attached).
I have included part of the log file, for both a successful and failed run.