Want to Stop Rebuilding "Expensive" Parts of your Flow? Explicit Builds are the Answer!READ MORE

Storing Datasets in Dataiku (with SQL Server and More)

ZeevGetner
Level 1
Storing Datasets in Dataiku (with SQL Server and More)

Hi,

 

I am about to start a project with dataiku, and have tow questions

 

1) i saw in the tutorials, that one basic thing to do with Dataiku, is to create a dataset. I wanted to know, where they are stored.

2) Is it possible (and also logical), to store them, in sql-server?

The reason I am asking, is that the projects involves several stages, 

some of them are simple and can be developed in sql-server, and some of them are more of mathematical models, and should be developed in dataiku with python.

 

also, it is important to notice, that the users, are interested to see the results, after each stage.

 

Thank you

Thank you everybody

 

Please note the topic of this post was modified by the moderator

0 Kudos
2 Replies
MiguelangelC
Dataiker
Dataiker

Hi,

1)

Datasets are stored in the location specified by their connection settings.
This can be a database table, files on disk, S3 buckets, etc. The icon of the dataset will hint at how the data is stored. For a clearer perspective of where the data actually is you can doubleclick on the specific dataset in the flow and go to Settings > Connection.

These connections are specified in the DSS instance's Administration page under the Connections section.
By default, a fresh DSS install will already come with a connection to the DSS server's filesystem. This means, datasets using that connection are saved to the 'managed_datasets' directory in your 'Data directory' (this data directory is where DSS was installed).

More information about data storage can be found in the help:
https://doc.dataiku.com/dss/latest/operations/datadir.html

https://doc.dataiku.com/dss/latest/connecting/index.html

2)

As it follows from above, data can be stored wherever it makes sense based on your business requirements.
Microsoft SQL server is a supported connection, so that should be no problem (https://doc.dataiku.com/dss/latest/connecting/sql/sqlserver.html).
One thing to keep in mind regarding dataset storage is the execution/computation engine. It is generally recommended to offload the workload to where the data lives, as it is often better performance-wise. So, if you plan on using databases it is worth having a look into the database engine, instead of using the DSS in-memory engine: https://knowledge.dataiku.com/latest/courses/basics/explore-flow/concept-computation-engine.html

3)

Also by default, you will be able to explore a sample of the inputs and outputs of each recipe in the flow.

 

ZeevGetner
Level 1
Author

1) Thank you

2) I want to ask a follow-up question.

I am responsible, of migrating an existing system, from sas to dataiku.

some of the stages, are written in sql update\insert\delete.

is it possible (and logical), to use sql within dataiku?

0 Kudos