Snowpark with Dataiku
 
            While going through one of Dataiku's blogs, I found out about integration with Snowpark for Python.
Just wondering how to get started with this. Does this mean utilization of the snowflake-snowpark-python library? or somehow we can use the snowpark in recipes?
Also, to what extent it is going to optimize my operations like say over a dataset with 1 million rows and 30 columns?
Thanks in advance,
Madhuleena
Best Answer
- 
             StephenWagner Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered, Product Ideas Manager Posts: 6 Dataiker StephenWagner Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered, Product Ideas Manager Posts: 6 DataikerHi @nmadhu20 This tutorial explains how to get started and details on using Snowpark within Dataiku: 
 Using Snowpark Python in Dataiku: basics
Answers
- 
             importthepandas Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 115 Neuron importthepandas Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 115 NeuronHi Team Bumping this as we are beginning to explore snowpark now with Dataiku. Snowpark looks quite powerful, especially its python APIs, if you happen to be a snowflake customer. Question on dataiku integrations: with spark and our envs we can build for spark and have everything represented during runtime, which is nice. With snowflake stored procs and snowpark, you need to provide the python spec inline. I'm assuming this is manual? Meaning we'll need to ensure the stored proc will need to have the spec, which will be overhead outside of DSS? Also, considering most packages are handled via anaconda, im assuming config will be needed to install from e.g. an internal pypi? 
- 
            Hi, Yes, you’ll need to list the python packages you want in the sproc inline. You can do this with the snowpark session object or as you define the sproc: session.add_packages() Or @sproc(packages=["pandas", "xgboost==1.5.0"]) Snowflake maintains a dedicated anaconda repo that all Snowpark code can leverage, so no additional admin overhead as long as you’re using one of the packages listed here: https://repo.anaconda.com/pkgs/snowflake/ You can submit requests for additional packages through Snowflake community and I’ve found them to be very responsive. 
 If you want to use a custom package, there’s a procedure for it: https://medium.com/snowflake/using-other-python-packages-in-snowpark-a6fd75e4b23a
 Hope this helps!Pat 
- 
             importthepandas Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 115 Neuron importthepandas Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 115 Neuronyou rock @pmasiphelps 
 hopefully this will help me write 80% of my code in snowflake
