How to parallelize SQL query
JohnB
Registered Posts: 32 ✭✭✭✭✭
I have a MySQL table of a billion rows that can be partitioned into 100,000 by a category column.
From that table I would like to run a complex query against each partition to produce another subset of rows, unaggregated.
How can I do this in DSS to run these queries in a few parallel threads at a time and then consolidate the results?
Best Answer
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
Hi,
If your SQL dataset is partitioned, see: https://doc.dataiku.com/dss/latest/partitions/sql_datasets.html
Then you will have multiple parallel threads being created as each partition will have its own activity. By default 5 activities can be executed simultaneously on a DSS instance. This limit can be increased depending on your use case.