Storing Datasets in Dataiku (with SQL Server and More)

ZeevGetner
ZeevGetner Registered Posts: 2 ✭✭✭

Hi,

I am about to start a project with dataiku, and have tow questions

1) i saw in the tutorials, that one basic thing to do with Dataiku, is to create a dataset. I wanted to know, where they are stored.

2) Is it possible (and also logical), to store them, in sql-server?

The reason I am asking, is that the projects involves several stages,

some of them are simple and can be developed in sql-server, and some of them are more of mathematical models, and should be developed in dataiku with python.

also, it is important to notice, that the users, are interested to see the results, after each stage.

Thank you

Thank you everybody

Please note the topic of this post was modified by the moderator

Answers

  • Miguel Angel
    Miguel Angel Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 118 Dataiker

    Hi,

    1)

    Datasets are stored in the location specified by their connection settings.
    This can be a database table, files on disk, S3 buckets, etc. The icon of the dataset will hint at how the data is stored. For a clearer perspective of where the data actually is you can doubleclick on the specific dataset in the flow and go to Settings > Connection.

    These connections are specified in the DSS instance's Administration page under the Connections section.
    By default, a fresh DSS install will already come with a connection to the DSS server's filesystem. This means, datasets using that connection are saved to the 'managed_datasets' directory in your 'Data directory' (this data directory is where DSS was installed).

    More information about data storage can be found in the help:
    https://doc.dataiku.com/dss/latest/operations/datadir.html

    https://doc.dataiku.com/dss/latest/connecting/index.html

    2)

    As it follows from above, data can be stored wherever it makes sense based on your business requirements.
    Microsoft SQL server is a supported connection, so that should be no problem (https://doc.dataiku.com/dss/latest/connecting/sql/sqlserver.html).
    One thing to keep in mind regarding dataset storage is the execution/computation engine. It is generally recommended to offload the workload to where the data lives, as it is often better performance-wise. So, if you plan on using databases it is worth having a look into the database engine, instead of using the DSS in-memory engine: https://knowledge.dataiku.com/latest/courses/basics/explore-flow/concept-computation-engine.html

    3)

    Also by default, you will be able to explore a sample of the inputs and outputs of each recipe in the flow.

  • ZeevGetner
    ZeevGetner Registered Posts: 2 ✭✭✭

    1) Thank you

    2) I want to ask a follow-up question.

    I am responsible, of migrating an existing system, from sas to dataiku.

    some of the stages, are written in sql update\insert\delete.

    is it possible (and logical), to use sql within dataiku?

Setup Info
    Tags
      Help me…