Data behavior when importing data from a DB connected to Dataiku
When Snowflake and Dataiku are connected and data is imported,
Dataiku will physically import the data?
Thank you in advance.
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,166 Neuron
No. When using a Dataiku datasets on a Snowflake connection the data always stays in Snowflake. All Dataiku does is to cache a data sample, usually 10000 rows unless changed, to facilitate dataset exploration in the flow. Certain recipes like Python or those using the DSS engine will require the data to loaded into the DSS server first. All other recipes will be push down to Snowflake so the data will not be loaded into memory.
-
Thank you for your answer, it is very helpful.
For example, but in one flow, we can push down to Snowflake and
If I want to create a flow that combines a specific recipe, such as a recipe to process and a recipe that uses Python or DSS engine,
Can the engine to be processed be changed to match the recipe to be created?Also, if the goal of the flow is to perform visual statistics such as multivariate analysis or visual machine learning in the lab and incorporate it into the flow, would the DSS engine be used for statistical analysis and machine learning training and prediction?
I am using the on-premise version and am unable to verify the use of different engines, which is why I asked this question.
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,166 Neuron
If I want to create a flow that combines a specific recipe, such as a recipe to process and a recipe that uses Python or DSS engine, can the engine to be processed be changed to match the recipe to be created?
I can't understand what you mean by this but changing the engine of a recipe may be possible in some cases and others not. It's too complicated to put in a post so you should do your own testing to confirm you specific question.
Also, if the goal of the flow is to perform visual statistics such as multivariate analysis or visual machine learning in the lab and incorporate it into the flow, would the DSS engine be used for statistical analysis and machine learning training and prediction?
Yes.
I am using the on-premise version and am unable to verify the use of different engines, which is why I asked this question.
Not sure why you say you can't verify the engine. Recipe engines can be clearly seen in the flow. In fact there is a whole flow view that.
-
Sorry, I didn't ask the right question.
Using Dataiku dataset with Snowflake connection to preprocess data, perform statistical analysis and machine learning training and predictions
I wanted to know if it is possible to have the statistical analysis and machine learning training and forecasting part pushed down to Snowflake as well, given the flow. -
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,166 Neuron
-
LouisDHulst Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Neuron, Registered, Neuron 2023 Posts: 54 Neuron
All of the visual ML and statistical analysis part of Dataiku require the DSS engine (or Spark for ML), but with additions being made to Snowflake ML and Snowpark it's now possible to use Dataiku to train Snowpark ML models. There's also the Visual Snowpark ML plugin.