How to parallelize SQL query

JohnB · ‎08-19-2021

I have a MySQL table of a billion rows that can be partitioned into 100,000 by a category column.

From that table I would like to run a complex query against each partition to produce another subset of rows, unaggregated.

How can I do this in DSS to run these queries in a few parallel threads at a time and then consolidate the results?

AlexT · ‎08-20-2021

Hi,

If your SQL dataset is partitioned, see: https://doc.dataiku.com/dss/latest/partitions/sql_datasets.html

Then you will have multiple parallel threads being created as each partition will have its own activity. By default 5 activities can be executed simultaneously on a DSS instance. This limit can be increased depending on your use case.

https://doc.dataiku.com/dss/latest/flow/limits.html

View solution in original post

AlexT · ‎08-20-2021

Hi,

If your SQL dataset is partitioned, see: https://doc.dataiku.com/dss/latest/partitions/sql_datasets.html

Then you will have multiple parallel threads being created as each partition will have its own activity. By default 5 activities can be executed simultaneously on a DSS instance. This limit can be increased depending on your use case.

https://doc.dataiku.com/dss/latest/flow/limits.html

Sign up to take part

How to parallelize SQL query

How to parallelize SQL query