Discover this year's submissions to the Dataiku Frontrunner Awards and give kudos to your favorite use cases and success stories!READ MORE

partitionning, parallelization and projections with vertica

partitionning, parallelization and projections with vertica

I use DSS v4.0.1.

I have a CSV input dataset patitionned by year in files (/%Y_dataset_src), and a reciepe for preparing data into a vertica dataset (partionned among %Y in a column date).

I need parallelization because this job is quite long (20h).

The partionning is ok and the execution works well year by year.

When I build all over the years (1970/2016) the job starts well and parallelize 4 partitions at a time.

But after for 2 or 3 years it fails on most partitions (2 out of 3) with this error :

[Vertica][VJDBC](2083) ERROR: A Moveout operation is already in progress on projection public.dataset_super

I guess there is a problem with parallelizing the closure operation of projection (like indexation) which is global and then doesn't support parallelization. I guess It is possible to do that projection after all partitions are processed but I don't know to proceed.
0 Kudos
0 Replies


Labels (1)
A banner prompting to get Dataiku