Need more dataset file format while exporting dataset to local such as parquet

Options
PANKAJ
PANKAJ Partner, L2 Admin, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Registered Posts: 26 Partner

Actually to do unit testing on the final and intermediate datasets, need more dataset file formats such as parquet, Avro, sas7bdat, ORC, etc while exporting datasets to the local system for large datasets, as CSV format can't handle more than 1 million records.

1
1 votes

Considered · Last Updated

Comments

  • natejgardner
    natejgardner Neuron, Registered, Neuron 2022, Neuron 2023 Posts: 151 Neuron
    Options

    It would also be great if more file formats were allowed for import.

    Microsoft Access, SQLite, and edb come to mind as most frequently needed.

  • PANKAJ
    PANKAJ Partner, L2 Admin, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Registered Posts: 26 Partner
    Options

    @natejgardner

    Yes, I agree with you on more file format options for importing dataset will also be very helpful.

  • AshleyW
    AshleyW Dataiker, Alpha Tester, Dataiku DSS Core Designer, Registered, Product Ideas Manager Posts: 161 Dataiker
    Options

    Hi @natejgardner
    ,

    FYI Microsoft Access and SQLite are supported filte formats for importing data into DSS. I've provided links to the relevant referenc documentation and community articles. If there are file formats that we don't support yet that you'd like to see made available in DSS, feel free to add that as a separate post on the Product Ideas board.

    Best,

    Ashley

  • natejgardner
    natejgardner Neuron, Registered, Neuron 2022, Neuron 2023 Posts: 151 Neuron
    Options

    Thanks @AshleyW
    , unfortunately these approaches require the files to already be exposed on the network or manually uploaded to the Dataiku server. But most teams I've worked with that generate these will just send them as file attachments. Ideally, they could be uploaded and processed as true flat files the same way Excel and CSV files are. Even when teams do upload their flat file databases to a network location, if they use a Windows file share, if the Dataiku instance doesn't have saml authentication configured, there's no way to authenticate. It'd be a big time saver when working with these sorts of files if these drivers that convert flat file databases into sql connections could also be embedded into the file parsing system directly so Access and SQLite become supported as file formats as well.

Setup Info
    Tags
      Help me…