Flow in Dataiku

Solved!
ya
Level 1
Flow in Dataiku
Hi,

when I create a flow, for example taking records from table A -> do some blending -> group by -> implementing a model

I see that each step in the process is crating a recipe.

Does this mean that each step saves the data to - csv if we read from csv the data / sql table if we read from sql DB?

or the process is InMemory ?



Thanks
0 Kudos
1 Solution
AdrienL
Dataiker

Hi,



A recipe will read from the input (upstream) dataset and write into its output dataset every time you run it, including if you run several recipes to build a final dataset. Some recipes can be executed directly in the SQL database for instance, depending on where your data is and the Engine you set for that recipe, see Execution Engines. Depending on the engine and recipe, the data may be streamed (and not need a lot of memory), or loaded fully into memory. DSS will try to advise by default-selecting the best available engine for your recipes.



Under certain conditions, you can skip the writing of intermediate datasets you don't need using Spark pipelines.

View solution in original post

0 Kudos
4 Replies
AdrienL
Dataiker

Hi,



A recipe will read from the input (upstream) dataset and write into its output dataset every time you run it, including if you run several recipes to build a final dataset. Some recipes can be executed directly in the SQL database for instance, depending on where your data is and the Engine you set for that recipe, see Execution Engines. Depending on the engine and recipe, the data may be streamed (and not need a lot of memory), or loaded fully into memory. DSS will try to advise by default-selecting the best available engine for your recipes.



Under certain conditions, you can skip the writing of intermediate datasets you don't need using Spark pipelines.

0 Kudos
ya
Level 1
Author
So if I load data from MS SQL Server table it will create a table for each step of the process ( after grouping, cleansing...) ?
0 Kudos
AdrienL
Dataiker
A table after each recipe, yes. Each recipe has a Dataset as output, and that dataset is written in a SQL table (or other storage backend you configure).
0 Kudos
muthu11
Level 2

Like to know how DSS loads data into  SQL tables.

For example,

we have 10K records in the source when the flow is executed it will load after processing each record in batches (say after load after 1K records ) or it will start loading after all the records are processed (after 10K records )?

 

0 Kudos