How to reduce the time it takes to upload to TableauServer

Yusuke
Level 2
How to reduce the time it takes to upload to TableauServer

I uploaded data to TableauServer and it took about 8 hours. (This is about 200 million rows and 40 columns of data)
I ran it using the TableauHyperFormat plugin (https://www.dataiku.com/product/plugins/tableau-hyper-export/), and the logs show that I read 5000 rows into DF each, and 2000 rows each The logs appear to be reading 5,000 rows into the DF and executing writes of 2,000 rows each.

Is it possible to edit the parameters of the plugin to increase the upload speed by editing the amount of data written at a time?

Also, any other ideas on how to speed up the upload?

Thank you in advance.

0 Kudos
3 Replies
CatalinaS
Dataiker

Hi @Yusuke,

 

Looking at the plugin code, I can confirm that the writes are indeed executed on 2,000 rows:

https://github.com/dataiku/dss-plugin-tableau-hyper/blob/master/python-lib/tableau_table_writer.py#L...

The batch size is hardcoded to 2000 here:

https://github.com/dataiku/dss-plugin-tableau-hyper/blob/master/python-lib/tableau_table_writer.py#L... 

You could try to edit this parameter of the plugin, but this might improve only slightly the upload speed.

0 Kudos
Yusuke
Level 2
Author

Thanks for the advice.
I will give it a try.

0 Kudos
Yusuke
Level 2
Author

Hi,@CatalinaS 

Thanks for the advice.
I actually rewrote the batch size part to 10000 and implemented it, but did not see that much improvement in speed.

Currently we are only reading 5000 rows each in DF, is it possible to improve this by reading more rows?

I was not powerful enough and did not know what part of the code specifies the rows to be read in the DF.
Any advice on how to increase the amount of rows read at a time would be appreciated.

0 Kudos

Labels

?

Setup info

?
A banner prompting to get Dataiku