partitionning, parallelization and projections with vertica
UserBird
Dataiker, Alpha Tester Posts: 535 Dataiker
Hi,
I use DSS v4.0.1.
I have a CSV input dataset patitionned by year in files (/%Y_dataset_src), and a reciepe for preparing data into a vertica dataset (partionned among %Y in a column date).
I need parallelization because this job is quite long (20h).
The partionning is ok and the execution works well year by year.
When I build all over the years (1970/2016) the job starts well and parallelize 4 partitions at a time.
But after for 2 or 3 years it fails on most partitions (2 out of 3) with this error :
[Vertica][VJDBC](2083) ERROR: A Moveout operation is already in progress on projection public.dataset_super
I guess there is a problem with parallelizing the closure operation of projection (like indexation) which is global and then doesn't support parallelization. I guess It is possible to do that projection after all partitions are processed but I don't know to proceed.
I use DSS v4.0.1.
I have a CSV input dataset patitionned by year in files (/%Y_dataset_src), and a reciepe for preparing data into a vertica dataset (partionned among %Y in a column date).
I need parallelization because this job is quite long (20h).
The partionning is ok and the execution works well year by year.
When I build all over the years (1970/2016) the job starts well and parallelize 4 partitions at a time.
But after for 2 or 3 years it fails on most partitions (2 out of 3) with this error :
[Vertica][VJDBC](2083) ERROR: A Moveout operation is already in progress on projection public.dataset_super
I guess there is a problem with parallelizing the closure operation of projection (like indexation) which is global and then doesn't support parallelization. I guess It is possible to do that projection after all partitions are processed but I don't know to proceed.
Tagged: