Submit your innovative use case or inspiring success story to the 2023 Dataiku Frontrunner Awards! LET'S GO

partitionning, parallelization and projections with vertica

UserBird
Dataiker
partitionning, parallelization and projections with vertica
Hi,

I use DSS v4.0.1.

I have a CSV input dataset patitionned by year in files (/%Y_dataset_src), and a reciepe for preparing data into a vertica dataset (partionned among %Y in a column date).

I need parallelization because this job is quite long (20h).

The partionning is ok and the execution works well year by year.

When I build all over the years (1970/2016) the job starts well and parallelize 4 partitions at a time.

But after for 2 or 3 years it fails on most partitions (2 out of 3) with this error :

[Vertica][VJDBC](2083) ERROR: A Moveout operation is already in progress on projection public.dataset_super

I guess there is a problem with parallelizing the closure operation of projection (like indexation) which is global and then doesn't support parallelization. I guess It is possible to do that projection after all partitions are processed but I don't know to proceed.
0 Kudos
0 Replies

Labels

?
Labels (1)
A banner prompting to get Dataiku