Do you know the History of Data Science? READ MORE

How to parallelize SQL query

Solved!
JohnB
Level 3
How to parallelize SQL query

I have a MySQL table of a billion rows that can be partitioned into 100,000 by a category column.

From that table I would like to run a  complex query against each partition to produce another subset of rows, unaggregated.

How can I do this in DSS to run these queries in a few parallel threads at a time and then consolidate the results?

0 Kudos
1 Solution
AlexT
Dataiker
Dataiker

Hi,

If your SQL dataset is partitioned, see: https://doc.dataiku.com/dss/latest/partitions/sql_datasets.html 

Then you will have multiple parallel threads being created as each partition will have its own activity. By default 5 activities can be executed simultaneously on a DSS instance. This limit can be increased depending on your use case. 

https://doc.dataiku.com/dss/latest/flow/limits.html

View solution in original post

0 Kudos
1 Reply
AlexT
Dataiker
Dataiker

Hi,

If your SQL dataset is partitioned, see: https://doc.dataiku.com/dss/latest/partitions/sql_datasets.html 

Then you will have multiple parallel threads being created as each partition will have its own activity. By default 5 activities can be executed simultaneously on a DSS instance. This limit can be increased depending on your use case. 

https://doc.dataiku.com/dss/latest/flow/limits.html

View solution in original post

0 Kudos
A banner prompting to get Dataiku DSS