Community Conundrum 28: News Engagement is live! Read More

Build several partitions in one go

Level 1
Build several partitions in one go


I want to synchronize an Oracle table of 1 billion rows to another Oracle table. The query is very long and I end up with the following Oracle error:

[11:06:27] [INFO] [dku.output.sql] - appended 178620000 rows, errors=0
[11:06:27] [INFO] [dku.utils] - Closing oracle.jdbc.driver.T4CConnection@7fc1cb4f
[11:06:27] [INFO] [dku.utils] - Conn oracle.jdbc.driver.T4CConnection@7fc1cb4f is now true
[11:06:27] [INFO] [dku.flow.activity] - Run thread failed for activity extract_items_NP
java.sql.SQLException: ORA-01555: snapshot too old: rollback segment number 5 with name "_SYSSMU5_3914542555$" too small
ORA-02063: preceding line from ORCBI01_SYSTEM

Therefore, I'd like to use partitioning to split query into multiple little queries. I use a discrete partition key that has a cardinality of about 200. When I run the job for just one key everything is ok.

Three questions:

  1. Is partitioning the right answer to my issue

  2. How can I build several partitions in one go

  3. How can run ALL partitions in one go without listing the 200 possible values

0 Kudos
3 Replies
Dataiker Alumni
Partitioning might indeed be an answer to your case. Beware that it requires to learn a bit about it first, and it needs some practice (don't be discouraged if things don't work on the first attempt !).

> synchronize an Oracle table of 1 billion rows to another Oracle table

If both datasets are partitioned, and you set the partition dependency to be "Equal" (which is the default), then DSS will indeed run the recipe partition by partition, as “multiple little queries”.

>How can I run several repartition key in one go

Specifying which partitions you want DSS to sync is done on the recipe page, just above the "run" button.

See to specify a (list of) partitions and more generaly to start learning about partitioning.

> How can run all partitions in one go without listing the 200 possible values

This isn't directly supported, but there is a workaround for now: add a recipe from the dataset for which you want to build all partitions to a dummy unpartitioned dataset. Define partition dependencies as “all available”.
0 Kudos
Level 1
Ok I have a clearer understanding now.

Anyway, something is not totally clear to me.

In the recipe page I choose "Explicit values" as partition dependencies to be able to run the recipe for a specific list of key. For instance 301420,300600.

But I also have to fill a value in the Build section of the recipe page. For example if I put 000000 inside when I run the recipe, DSS first deletes all the rows for the partition and it executes the following query :

Which obviously not what I need. Then it loads the data for partition 301420 and 300600, which is fine.

Can you enlighten me a little more please?
0 Kudos
Level 1
Ooops! I finally find the solution (and shame on me it was explicitly written...)

I need to fill Build text box with the syntax :
Any/Any/Any/... # Several values (ExactValue)

Now my third question is still opened : How I run all partition in one go ?
0 Kudos
Labels (2)
A banner prompting to get Dataiku DSS