Tabular datasets stored in managed filesystem as Python pickles or R *.rds files

Marek
Level 2
Tabular datasets stored in managed filesystem as Python pickles or R *.rds files

Does Dataiku plan to support tabular (data frame) datasets stored in the managed filesystem as Python pickles or R *.rds files, and eventually when?

Otherwise what are the planned solutions for Dataiku DSS to improve the Python or R data frame reading and writing operations on the managed filesystem? This is because the current implementation of dkuReadDataset and dkuWriteDataset functions which rely on .csv file format are unacceptably slow.

0 Kudos
1 Reply
Clรฉment_Stenac

Hi,

We do not have short-term plans to have pickle or rdata/rds underlying formats for datasets (which would give a short path when reading / writing from a Python recipe).

We definitely do not rule this out and will take good note of your feedback for informing our further priorities.

The main reason for this currently lower priority is that it would be of limited applicability: it would only make a difference for reading and writing datasets stored on the local filesystem, and only when reading/writing in Python or R recipes. These only represent a small subset of how customers leverage Dataiku.

Of course, you are fully free to use "normal" pickle / RData loading and writing in a Python or R recipe, Dataiku does not limit in anyway what you can do in your code. Notably, you can store said files in managed folders (instead of managed datasets): https://doc.dataiku.com/dss/latest/connecting/managed_folders.html

 

0 Kudos