Hello dear community,
I have been experimenting with both @partitioning and @chunking datasets, and now have a problem.
My input dataset is partitioned, but the partitions still contain a lot of records, thus I want to process the partitions with @python in chunks.
In the documentation on https://doc.dataiku.com/dss/latest/python-api/datasets.html I did find how I can write the chunks and also adjusted this a bit, so the first time the schema was written. However, this does not seem to work correctly in combination with partitioning, since then the script is run 5 times in parallel.
Thanks in advance...
Hi,
I don't think we understand what the exact issue is. What is the warning you encountered ? Note that you can prevent parallel executions of the recipe in partitioning mode, by setting the parallelism limit in the Advanced settings of the recipe.
The error message was: "table already exists but with an incompatible schema:"
In the run with 72 partitions, 2 of these failed, while 70 ran without errors (see screen shot attached).
I have included part of the log file, for both a successful and failed run.