Community Conundrum 25:Feature Visualization is now live! Read More

Partitioning and chunking with python recipe

Level 3
Level 3
Partitioning and chunking with python recipe

Hello dear community, 

I have been experimenting with both @partitioning and @chunking datasets, and now have a problem.

My input dataset is partitioned, but the partitions still contain a lot of records, thus I want to process the partitions with @python in chunks.

In the documentation on https://doc.dataiku.com/dss/latest/python-api/datasets.html I did find how I can write the chunks and also adjusted this a bit, so the first time the schema was written. However, this does not seem to work correctly in combination with partitioning, since then the script is run 5 times in parallel.

Thanks in advance...

0 Kudos
3 Replies
Level 3
Level 3
Author
by uncommenting the lines, and rerun the recipe several times, I was able to remove the warning, however, it does not really 'feel' as a correct way of working
0 Kudos
Dataiker
Dataiker

Hi,

I don't think we understand what the exact issue is. What is the warning you encountered ? Note that you can prevent parallel executions of the recipe in partitioning mode, by setting the parallelism limit in the Advanced settings of the recipe.

0 Kudos
Level 3
Level 3
Author

The error message was: "table already exists but with an incompatible schema:"

In the run with 72 partitions, 2 of these failed, while 70 ran without errors (see screen shot attached).

I have included part of the log file, for both a successful and failed run.

0 Kudos