SQL intermediate dataset isolation

Vin · February 2024

So I'm trying to admin a dataiku platform.

Until recently for security reasons we weren't cleared to use SQL (Postgres) datasets, so I've only recently started to experiment with those.

Now I've noticed that by default all the intermediate tables appear for everyone sharing the connexion.

There are quite a few problems with that: all Dataiku security layer goes through the window and everybody is bogged with everyone else's intermediate datasets.

There are good reasons to use SQL tables as intermediate datasets, if only to use SQL recipes that are often more efficient than DSS engine but this is very limiting.

Is this solvable?

If I want some degree of isolation between projects should I create a specific schema/connexion for each project ?

Am I missing something ?

Turribeach · February 2024

All the permissions in Dataiku are managed via groups (and users in some cases but you shouldn't really be using user permissions). So to have isolation between your teams you need to have a group defined for each separate team and have folders and connections permissioned for the group. Then add the users to the relevant groups and they won't be able to see any projects, connections or data from the other groups/teams. In addition to this you can make your groups external by using LDAP so you can use existing AD groups or new ones to segment your users in Dataiku.

Vin · February 2024

Hi,

I mostly get that.

My problem is how to manage isolation for SQL datasets (we're using Postgres).

With default parameters everybody sees all the tables which can create confidentiality and organisational problems.

I (don't think) I have that problem with csv tables, but for some tasks SQL datasets are simply better.

What is the good practice ?

Is it having a schema and/or connexion per project (or group of projects) ? Something else.

Best regards

SQL intermediate dataset isolation

Answers

Categories

Setup Info

Tags