just a suggestion for some improvement towards Snowflake integration in DSS. It would be nice to have opportunity to dynamically scale up or down the warehouse as a part of the visual recipes. Because there might be some heavy joins or other operations which require more computation power than rest of the flow.
Currently I do workaround for this and define the scaling as separate steps in scenario but it also means I have to divide the scenario into more smaller steps.
I think it could be part of advanced tab of the recipe where you would just call Snowflake command for ALTER WAREHOUSE and just scale it up for that one recipe. Of course it would be also nice to have an opportunity to call some post recipe command to scale it back down or do it automatically.
Thanks very much for the suggestion. I'm the product manager looking after our Snowflake integrations.
Indeed this is something I have on my radar and started to discuss with Snowflake. We do want to eventually help our users optimize the cost of compute although not sure quite yet how.
If you can spare some time I'd be interested with following up with you directly?
Hi, I support this topis as we are facing the same issue. Running a pre-SQL or post-SQL would be very welcome (similar to what is possible when using Impala engine). Currently the only way is to use code/SQL recipe.
Thanks @dan_fejtek . Quick question for you both: if you are already changing the DW size with code in your flows. How do you manage concurrent execution of flows and the possible conflict between 2 flows trying to resize?
For context, Snowflake recommended we do not resize the DW in the flow, but allow to switch to different DW for each recipe.
Having some issue with my account and I cannot login to that account I created yesterday so I create a new one 🙂
@JCR One more thought still, which is relevant for using different warehouse sizes. At the moment DSS will not allow you to use "In database (SQL)" engine if all source and target datasets of a recipe do not use the same DSS connection. In a sense DSS forces the developers to use a single DSS connection for the whole Snowflake DB.
Since the warehouse is defined on DSS connection, I can imagine that changing the warehouse might be done also by allowing to use the SQL engine for different DSS connections - if the connections share the same snowflake account and user (which means they have the same privileges in source and target connections).
The warehouse to use could then be picked from any source or target of the recipe connections.
For example: staging connection points to STAGING schema, using S warehouse, analytics connection pointing to ANALYTICS schema using XL warehouse. Now when you have a recipe loading data from staging to analytics you could pick, if you want to use the pre-defined S or XL warehouse.
Absolutely! The concept of connection for Snowflake is quite different from traditional databases. My sense is we need to revisit how we define connections to Snowflake in order to allow the enhancements discussed in this thread. We're looking into this.
That is actually my aim - not to use "alter warehouse" but "use warehouse" - we plan to use pre-defined warehouses of different sizes for different jobs.
Alter warehouse is in my opinion not really usable in concurent flows (using the same WH), only in sequential jobs, which however might also be a valid case. So perhaps leaving it to the developer and just enable pre/post-SQL would work just fine.
@JCR just found one very interesting workaround for running a recipe with a specific warehouse (different than the connection default one). When you setup recipe with all sources and target, pick "in database (sql)" engine, you can go to output dataset, settings->advanced tab and there you have "Pre-write statements" and "Post-write statements" fields, where you actually CAN put "use warehouse xxx" and quickly testing this, it seems to be working.
Hey, we're just trying out Snowflake and your suggestion seems to work.
If the visual recipe is pushed down to Snowflake, both USE WAREHOUSE as well as ALTER WAREHOUSE are working (so it is also possible to increase the size for a single visual recipe).
Sounds good, and please don't hesitate to open a thread when you find opportunities for improvements in the integration with Snowflake.
BTW: with 8.0.3 we released a feature that allows you to select a warehouse using the advanced properties of the recipe.
In the recipe variables you can add:
As I was working on the documentation I realized some information is missing in this thread.
Specifically, the variables mentioned above need to be substituted in the connection definition. Here is what it looks like:
In the connection:
In the recipe properties
As always, feedback welcome!
Thanks for the clarification! Where would one store the connection's default variables? In the DSS system variables?
I know this is getting a little off-topic, but whats the variable precedence?
System > (User?) > Project > (Scenario?) > Recipe?
Good question: reading left to right, where left supersedes right
Recipe > Scenario > Project > User > Global
For each of those artifacts there is a place where you can define the variables.
This topic provides general guidelines and best practices for using virtual warehouses in Snowflake to process queries. It does not provide specific or absolute numbers, values, or recommendations because every query scenario is different and is affected by numerous factors, including number of concurrent users/queries, the number of tables being queried, and data size and composition, as well as your specific requirements for warehouse availability, latency, and cost.
It also does not cover warehouse considerations for data loading, which are covered in another topic (see the sidebar).
The keys to using warehouses effectively and efficiently are:
Experiment with different types of queries and different warehouse sizes to determine the combinations that best meet your specific query needs and workload.
Don’t focus on warehouse size. Snowflake ( Snowflake Online Certification Training ) utilizes per-second billing, so you can run larger warehouses (Large, X-Large, 2X-Large, etc.) and simply suspend them when not in use.