Reading dataset in Python recipe is very slow

beatrix_lin
Level 1
Reading dataset in Python recipe is very slow
I was using the following lines to read dataset into pandas dataframe

data=dataiku.Dataset('dataset name')

df_data=data.get_dataframe()

it takes almost 3 mins to read in the table. Alternatively if I export the dataset into csv and read in the csv it only takes 12s. I was wondering if there a more efficient way to read the dataset as panda dataframe without creating the intermediate csv file?

Thanks,
2 Replies
Clรฉment_Stenac
Hi,

How long does it take to export the dataset into CSV?
0 Kudos
Tsfan_22
Level 2

Hi @Clรฉment_Stenac ,

 

I'm experiencing the same issue as the user above. It takes around 11 minutes to load a 2.2gb dataset into a dataframe. Running on my laptop it takes around 1min 15s.

Exporting the dataset to csv takes similarly long as importing it to the dataframe.

Any tips on how to speed this up or what storage type to use for quicker loads?

 

Thanks!

0 Kudos

Labels

?
Labels (1)
A banner prompting to get Dataiku