Setting up Stages in Snowflake to work with Dataiku

tgb417
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,630 Neuron

In Dataiku DSS when working with Snowflake there is an option to use a stage. This apparently speeds up performance by increasing the number of different types of processes one can do inside Snowflake without having to ship data back to the DSS server for processing.

Are folks using this feature? What has your experience been in terms of performance improvements. What types of steps are you trying to use that still do not work with the Stage and forces a round trip to the DSS Server.

When it comes to setting up Stages in Snowflake:

  1. How many stages do you create?
    1. One per Snowflake Login? If so in what database do you have the Stage?
    2. One per Database. If so in what Schema do you put the Stage
    3. One per Schema
    4. Something else.
  2. When working across database what problems if any have you run into.

Operating system used: Dataiku Cloud

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023, Circle Member Posts: 2,590 Neuron

    Hi Tom, while I don't have any particular insights in this Snowflake feature by far the biggest performance improvement you can get will be to make sure that your recipes run in pushdown mode, aka they execute in SQL. For that you should review the Prepare recipe supported SQL / Snowflake processors and use the Flow view "Recipe Engines" to see what's not using SQL Engine:

    image.png
  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,630 Neuron

    We finally got stages working at least for visual recipes. What it seems to do is allow a short list of visual recipe function that can not be done in SQL directly to be done on the Snowflake platform (pushed down) to the Snowflake Warehouse (compute).

    https://doc.dataiku.com/dss/latest/preparation/engines.html#details-on-the-in-database-sql-engine

    I know that it is working. I've not done any evaluation of the actual performance improvements for these features.

    @Turribeach hope you are doing well.

Setup Info
    Tags
      Help me…