Reading dataset in Python recipe is very slow

beatrix_lin
beatrix_lin Registered Posts: 3 ✭✭✭✭
I was using the following lines to read dataset into pandas dataframe

data=dataiku.Dataset('dataset name')

df_data=data.get_dataframe()

it takes almost 3 mins to read in the table. Alternatively if I export the dataset into csv and read in the csv it only takes 12s. I was wondering if there a more efficient way to read the dataset as panda dataframe without creating the intermediate csv file?

Thanks,
Tagged:

Answers

  • Clément_Stenac
    Clément_Stenac Dataiker, Dataiku DSS Core Designer, Registered Posts: 753 Dataiker
    Hi,

    How long does it take to export the dataset into CSV?
  • Tsfan_22
    Tsfan_22 Registered Posts: 5

    Hi @Clément_Stenac
    ,

    I'm experiencing the same issue as the user above. It takes around 11 minutes to load a 2.2gb dataset into a dataframe. Running on my laptop it takes around 1min 15s.

    Exporting the dataset to csv takes similarly long as importing it to the dataframe.

    Any tips on how to speed this up or what storage type to use for quicker loads?

    Thanks!

Setup Info
    Tags
      Help me…