Data behavior when importing data from a DB connected to Dataiku

kyo
kyo Registered Posts: 4 ✭✭

When Snowflake and Dataiku are connected and data is imported,
Dataiku will physically import the data?
Thank you in advance.

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,166 Neuron
    edited August 27

    No. When using a Dataiku datasets on a Snowflake connection the data always stays in Snowflake. All Dataiku does is to cache a data sample, usually 10000 rows unless changed, to facilitate dataset exploration in the flow. Certain recipes like Python or those using the DSS engine will require the data to loaded into the DSS server first. All other recipes will be push down to Snowflake so the data will not be loaded into memory.

  • kyo
    kyo Registered Posts: 4 ✭✭

    Thank you for your answer, it is very helpful.

    For example, but in one flow, we can push down to Snowflake and
    If I want to create a flow that combines a specific recipe, such as a recipe to process and a recipe that uses Python or DSS engine,
    Can the engine to be processed be changed to match the recipe to be created?

    Also, if the goal of the flow is to perform visual statistics such as multivariate analysis or visual machine learning in the lab and incorporate it into the flow, would the DSS engine be used for statistical analysis and machine learning training and prediction?

    I am using the on-premise version and am unable to verify the use of different engines, which is why I asked this question.

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,166 Neuron

    If I want to create a flow that combines a specific recipe, such as a recipe to process and a recipe that uses Python or DSS engine, can the engine to be processed be changed to match the recipe to be created?

    I can't understand what you mean by this but changing the engine of a recipe may be possible in some cases and others not. It's too complicated to put in a post so you should do your own testing to confirm you specific question.

    Also, if the goal of the flow is to perform visual statistics such as multivariate analysis or visual machine learning in the lab and incorporate it into the flow, would the DSS engine be used for statistical analysis and machine learning training and prediction?

    Yes.

    I am using the on-premise version and am unable to verify the use of different engines, which is why I asked this question.

    Not sure why you say you can't verify the engine. Recipe engines can be clearly seen in the flow. In fact there is a whole flow view that.

  • kyo
    kyo Registered Posts: 4 ✭✭

    Sorry, I didn't ask the right question.

    Using Dataiku dataset with Snowflake connection to preprocess data, perform statistical analysis and machine learning training and predictions
    I wanted to know if it is possible to have the statistical analysis and machine learning training and forecasting part pushed down to Snowflake as well, given the flow.

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,166 Neuron
  • LouisDHulst
    LouisDHulst Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Neuron, Registered, Neuron 2023 Posts: 54 Neuron

    All of the visual ML and statistical analysis part of Dataiku require the DSS engine (or Spark for ML), but with additions being made to Snowflake ML and Snowpark it's now possible to use Dataiku to train Snowpark ML models. There's also the Visual Snowpark ML plugin.

Setup Info
    Tags
      Help me…