Submit your innovative use case or inspiring success story to the 2023 Dataiku Frontrunner Awards! LET'S GO

Build several partitions in one go

Solved!
wawanco
Level 1
Build several partitions in one go

Hi,



I want to synchronize an Oracle table of 1 billion rows to another Oracle table. The query is very long and I end up with the following Oracle error:




[11:06:27] [INFO] [dku.output.sql] - appended 178620000 rows, errors=0
[11:06:27] [INFO] [dku.utils] - Closing oracle.jdbc.driver.T4CConnection@7fc1cb4f
[11:06:27] [INFO] [dku.utils] - Conn oracle.jdbc.driver.T4CConnection@7fc1cb4f is now true
[11:06:27] [INFO] [dku.flow.activity] - Run thread failed for activity extract_items_NP
java.sql.SQLException: ORA-01555: snapshot too old: rollback segment number 5 with name "_SYSSMU5_3914542555$" too small
ORA-02063: preceding line from ORCBI01_SYSTEM


Therefore, I'd like to use partitioning to split query into multiple little queries. I use a discrete partition key that has a cardinality of about 200. When I run the job for just one key everything is ok.



Three questions:




  1. Is partitioning the right answer to my issue

  2. How can I build several partitions in one go

  3. How can run ALL partitions in one go without listing the 200 possible values

0 Kudos
1 Solution
jrouquie
Dataiker Alumni
Partitioning might indeed be an answer to your case. Beware that it requires to learn a bit about it first, and it needs some practice (don't be discouraged if things don't work on the first attempt !).

> synchronize an Oracle table of 1 billion rows to another Oracle table

If both datasets are partitioned, and you set the partition dependency to be "Equal" (which is the default), then DSS will indeed run the recipe partition by partition, as “multiple little queries”.



>How can I run several repartition key in one go

Specifying which partitions you want DSS to sync is done on the recipe page, just above the "run" button.

See http://doc.dataiku.com/dss/latest/partitions/identifiers.html to specify a (list of) partitions and more generaly http://doc.dataiku.com/dss/latest/partitions/index.html to start learning about partitioning.



> How can run all partitions in one go without listing the 200 possible values

This isn't directly supported, but there is a workaround for now: add a recipe from the dataset for which you want to build all partitions to a dummy unpartitioned dataset. Define partition dependencies as “all available”.

View solution in original post

0 Kudos
4 Replies
jrouquie
Dataiker Alumni
Partitioning might indeed be an answer to your case. Beware that it requires to learn a bit about it first, and it needs some practice (don't be discouraged if things don't work on the first attempt !).

> synchronize an Oracle table of 1 billion rows to another Oracle table

If both datasets are partitioned, and you set the partition dependency to be "Equal" (which is the default), then DSS will indeed run the recipe partition by partition, as “multiple little queries”.



>How can I run several repartition key in one go

Specifying which partitions you want DSS to sync is done on the recipe page, just above the "run" button.

See http://doc.dataiku.com/dss/latest/partitions/identifiers.html to specify a (list of) partitions and more generaly http://doc.dataiku.com/dss/latest/partitions/index.html to start learning about partitioning.



> How can run all partitions in one go without listing the 200 possible values

This isn't directly supported, but there is a workaround for now: add a recipe from the dataset for which you want to build all partitions to a dummy unpartitioned dataset. Define partition dependencies as “all available”.
0 Kudos
wawanco
Level 1
Author
Ok I have a clearer understanding now.

Anyway, something is not totally clear to me.

In the recipe page I choose "Explicit values" as partition dependencies to be able to run the recipe for a specific list of key. For instance 301420,300600.

But I also have to fill a value in the Build section of the recipe page. For example if I put 000000 inside when I run the recipe, DSS first deletes all the rows for the partition and it executes the following query :
DELETE FROM "EXT_STN_PIE_PIECES" WHERE "LOGIN" = '000000'

Which obviously not what I need. Then it loads the data for partition 301420 and 300600, which is fine.

Can you enlighten me a little more please?
0 Kudos
wawanco
Level 1
Author
Ooops! I finally find the solution (and shame on me it was explicitly written...)

I need to fill Build text box with the syntax :
Any/Any/Any/... # Several values (ExactValue)

Now my third question is still opened : How I run all partition in one go ?
0 Kudos
JIshikawa
Level 3
0 Kudos

Labels

?
Labels (2)
A banner prompting to get Dataiku