New to Dataiku DSS? Try out our NEW Quick Start Programs today and get onboarded on the product in just one hour! Let's go

partitionning, parallelization and projections with vertica

UserBird
Dataiker
Dataiker
partitionning, parallelization and projections with vertica
Hi,

I use DSS v4.0.1.

I have a CSV input dataset patitionned by year in files (/%Y_dataset_src), and a reciepe for preparing data into a vertica dataset (partionned among %Y in a column date).

I need parallelization because this job is quite long (20h).

The partionning is ok and the execution works well year by year.

When I build all over the years (1970/2016) the job starts well and parallelize 4 partitions at a time.

But after for 2 or 3 years it fails on most partitions (2 out of 3) with this error :

[Vertica][VJDBC](2083) ERROR: A Moveout operation is already in progress on projection public.dataset_super

I guess there is a problem with parallelizing the closure operation of projection (like indexation) which is global and then doesn't support parallelization. I guess It is possible to do that projection after all partitions are processed but I don't know to proceed.
0 Kudos
0 Replies
Labels (1)
A banner prompting to get Dataiku DSS