Snowflake dataset override default connection details

FabriceD
FabriceD Registered Posts: 4

Hi

I was wondering what the exact names for the parameters were that need to be added in the 'specificSettings' to get a new managed dataset on snowflake be materialized in a catalog and schema different from the default details of the connection.

I added a screenshot of the UI (I am looking for the database and schema fields).
I want to use those in the python SDK (under the hood following API is used: https://doc.dataiku.com/dss/api/12/rest/#datasets-datasets-post-1)
But I do not seem to find the correct parameters...
I tried all different combinations of 'catalog', 'schema', 'database', 'snowflake_catalog',...

PS: If there is a better pattern to creating managed datasets through the SDK, please let me know.

Tagged:

Best Answer

  • FabriceD
    FabriceD Registered Posts: 4
    edited July 17 Answer ✓

    I was in contact with the Dataiku support team and got the answer needed:

    The exact names of the 'specificSettings' parameters are 'overrideSQLCatalog' and 'overrideSQLSchema':
    
    
    builder.creation_settings['specificSettings']['overrideSQLCatalog'] = new_dataset_snowflake_catalog
    builder.creation_settings['specificSettings']['overrideSQLSchema'] = new_dataset_snowflake_schema

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,980 Neuron

    Please post your code using a code block (the </> icon) so it can be copy/pasted.

    The code looks good to me. What exactly do you see on the created dataset? Is this an unmanaged (input) or managed (output) dataset that you are trying to create?

    Finally the link you posted is for the REST API not the Python API which you seem to be using.

  • FabriceD
    FabriceD Registered Posts: 4
    edited July 17

    Hi


    Here is the minimal example code to do what I want.

    from dataikuapi.dssclient import DSSClient
    
    host = ""
    api_key = ""
    connection_name = ""
    new_dataset_name = ""
    new_dataset_snowflake_schema = ""
    new_dataset_snowflake_catalog = ""
    
    # create client and build new dataset
    client = DSSClient(host, api_key, insecure_tls=True)
    builder = client.project.new_managed_dataset(new_dataset_name)
    builder.already_exists()
    # add connection details and overwrite defaults
    builder.with_store_into(connection_name)
    builder.creation_settings['specificSettings']['catalog'] = new_dataset_snowflake_catalog
    builder.creation_settings["specificSettings"]['schema'] = new_dataset_snowflake_schema
    # create and retrieve created dataset
    builder.create(overwrite=True)
    dataset = client.project.get_dataset(new_dataset_name)
    dataset.get_config()
    # --> params contain catalog and schema equal to the defaults of the connection

    Indeed the URL was for the REST API, but this one is used through that package.
    the 'create' function has the following implementation. and that is why I referenced the REST API documentation. There is no documentation available about the specificSettings that are possble to be used. (snippet from the dataikuapi-package)

        def create(self, overwrite=False):
            """
            Executes the creation of the managed dataset according to the selected options
            
            :param overwrite: If the dataset being created already exists, delete it first (removing data), defaults to False
            :type overwrite: bool, optional
    
            :returns: the newly created dataset
            :rtype: :class:`DSSDataset`
            """
            if overwrite and self.already_exists():
                self.project.get_dataset(self.dataset_name).delete(drop_data = True)
    
            self.project.client._perform_json("POST", "/projects/%s/datasets/managed" % self.project.project_key,
                body = {
                    "name": self.dataset_name,
                    "creationSettings":  self.creation_settings
            })
            return DSSDataset(self.project.client, self.project.project_key, self.dataset_name)

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,980 Neuron

    Can you post a screen shot of the settings of your Snowflake connection? (you can hide the sensitive parts).

  • FabriceD
    FabriceD Registered Posts: 4

    Unfortunately I cannot as I am not an admin on the portal.
    I could ask my admin but it might take a while.

    Connecting works, writing through python and the UI works as well. but overwriting the default schema only works in the UI...

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,980 Neuron

    Many thanks for posting it back and sharing with the community.

Setup Info
    Tags
      Help me…