Survey banner
Switching to Dataiku - a new area to help users who are transitioning from other tools and diving into Dataiku! CHECK IT OUT

Partitioning and chunking with python recipe

MRvLuijpen
Partitioning and chunking with python recipe

Hello dear community, 

I have been experimenting with both @partitioning and @chunking datasets, and now have a problem.

My input dataset is partitioned, but the partitions still contain a lot of records, thus I want to process the partitions with @python in chunks.

In the documentation on https://doc.dataiku.com/dss/latest/python-api/datasets.html I did find how I can write the chunks and also adjusted this a bit, so the first time the schema was written. However, this does not seem to work correctly in combination with partitioning, since then the script is run 5 times in parallel.

Thanks in advance...

0 Kudos
3 Replies
MRvLuijpen
Author
by uncommenting the lines, and rerun the recipe several times, I was able to remove the warning, however, it does not really 'feel' as a correct way of working
0 Kudos
Clรฉment_Stenac

Hi,

I don't think we understand what the exact issue is. What is the warning you encountered ? Note that you can prevent parallel executions of the recipe in partitioning mode, by setting the parallelism limit in the Advanced settings of the recipe.

0 Kudos
MRvLuijpen
Author

The error message was: "table already exists but with an incompatible schema:"

In the run with 72 partitions, 2 of these failed, while 70 ran without errors (see screen shot attached).

I have included part of the log file, for both a successful and failed run.

0 Kudos